hub / github.com/huggingface/datasets / DatasetInfo

Class DatasetInfo

src/datasets/info.py:92–331 · view source on GitHub ↗

Information about a dataset. `DatasetInfo` documents datasets, including its name, version, and features. See the constructor arguments and properties for a full list. Not all fields are known on construction and may be updated later. Attributes: description (`str`):

Source from the content-addressed store, hash-verified

90
91	@dataclass
92	class DatasetInfo:
93	"""Information about a dataset.
94
95	`DatasetInfo` documents datasets, including its name, version, and features.
96	See the constructor arguments and properties for a full list.
97
98	Not all fields are known on construction and may be updated later.
99
100	Attributes:
101	description (`str`):
102	A description of the dataset.
103	citation (`str`):
104	A BibTeX citation of the dataset.
105	homepage (`str`):
106	A URL to the official homepage for the dataset.
107	license (`str`):
108	The dataset's license. It can be the name of the license or a paragraph containing the terms of the license.
109	features ([`Features`], optional):
110	The features used to specify the dataset's column types.
111	post_processed (`PostProcessedInfo`, optional):
112	Deprecated. Information regarding the resources of a possible post-processing of a dataset. For example, it can contain the information of an index.
113	supervised_keys (`SupervisedKeysData`, optional):
114	Specifies the input feature and the label for supervised learning if applicable for the dataset (legacy from TFDS).
115	builder_name (`str`, optional):
116	The name of the `GeneratorBasedBuilder` subclass used to create the dataset. It is also the snake_case version of the dataset builder class name.
117	config_name (`str`, optional):
118	The name of the configuration derived from [`BuilderConfig`].
119	version (`str` or [`Version`], optional):
120	The version of the dataset.
121	splits (`dict`, optional):
122	The mapping between split name and metadata.
123	download_checksums (`dict`, optional):
124	The mapping between the URL to download the dataset's checksums and corresponding metadata.
125	download_size (`int`, optional):
126	The size of the files to download to generate the dataset, in bytes.
127	post_processing_size (`int`, optional):
128	Deprecated. Size of the dataset in bytes after post-processing, if any.
129	dataset_size (`int`, optional):
130	The combined size in bytes of the Arrow tables for all splits.
131	size_in_bytes (`int`, optional):
132	The combined size in bytes of all files associated with the dataset (downloaded files + Arrow files).
133	**config_kwargs (additional keyword arguments):
134	Keyword arguments to be passed to the [`BuilderConfig`] and used in the [`DatasetBuilder`].
135	"""
136
137	# Set in the dataset builders
138	description: str = dataclasses.field(default_factory=str)
139	citation: str = dataclasses.field(default_factory=str)
140	homepage: str = dataclasses.field(default_factory=str)
141	license: str = dataclasses.field(default_factory=str)
142	features: Optional[Features] = None
143	post_processed: Optional[PostProcessedInfo] = None # kept for bawkard compat
144	supervised_keys: Optional[SupervisedKeysData] = None
145
146	# Set later by the builder
147	builder_name: Optional[str] = None
148	dataset_name: Optional[str] = None # for packaged builders, to be different from builder_name
149	config_name: Optional[str] = None

Callers 15

test_concatenateMethod · 0.90

test_concatenate_formattedMethod · 0.90

test_concatenate_with_indicesMethod · 0.90

test_concatenate_with_indices_from_diskMethod · 0.90

test_concatenate_pickleMethod · 0.90

test_from_pandasMethod · 0.90

test_from_polarsMethod · 0.90

test_from_dictMethod · 0.90

test_concatenate_mixed_memory_and_diskMethod · 0.90

test_split_order_in_metadata_configs_from_exported_parquet_files_and_dataset_infosFunction · 0.90

_infoMethod · 0.90

Calls 1

fieldMethod · 0.80

Tested by 15

test_concatenateMethod · 0.72

test_concatenate_formattedMethod · 0.72

test_concatenate_with_indicesMethod · 0.72

test_concatenate_with_indices_from_diskMethod · 0.72

test_concatenate_pickleMethod · 0.72

test_from_pandasMethod · 0.72

test_from_polarsMethod · 0.72

test_from_dictMethod · 0.72

test_concatenate_mixed_memory_and_diskMethod · 0.72

test_split_order_in_metadata_configs_from_exported_parquet_files_and_dataset_infosFunction · 0.72

_infoMethod · 0.72