MCPcopy
hub / github.com/huggingface/datasets / DatasetInfo

Class DatasetInfo

src/datasets/info.py:92–331  ·  view source on GitHub ↗

Information about a dataset. `DatasetInfo` documents datasets, including its name, version, and features. See the constructor arguments and properties for a full list. Not all fields are known on construction and may be updated later. Attributes: description (`str`):

Source from the content-addressed store, hash-verified

90
91@dataclass
92class DatasetInfo:
93 """Information about a dataset.
94
95 `DatasetInfo` documents datasets, including its name, version, and features.
96 See the constructor arguments and properties for a full list.
97
98 Not all fields are known on construction and may be updated later.
99
100 Attributes:
101 description (`str`):
102 A description of the dataset.
103 citation (`str`):
104 A BibTeX citation of the dataset.
105 homepage (`str`):
106 A URL to the official homepage for the dataset.
107 license (`str`):
108 The dataset's license. It can be the name of the license or a paragraph containing the terms of the license.
109 features ([`Features`], *optional*):
110 The features used to specify the dataset's column types.
111 post_processed (`PostProcessedInfo`, *optional*):
112 Deprecated. Information regarding the resources of a possible post-processing of a dataset. For example, it can contain the information of an index.
113 supervised_keys (`SupervisedKeysData`, *optional*):
114 Specifies the input feature and the label for supervised learning if applicable for the dataset (legacy from TFDS).
115 builder_name (`str`, *optional*):
116 The name of the `GeneratorBasedBuilder` subclass used to create the dataset. It is also the snake_case version of the dataset builder class name.
117 config_name (`str`, *optional*):
118 The name of the configuration derived from [`BuilderConfig`].
119 version (`str` or [`Version`], *optional*):
120 The version of the dataset.
121 splits (`dict`, *optional*):
122 The mapping between split name and metadata.
123 download_checksums (`dict`, *optional*):
124 The mapping between the URL to download the dataset's checksums and corresponding metadata.
125 download_size (`int`, *optional*):
126 The size of the files to download to generate the dataset, in bytes.
127 post_processing_size (`int`, *optional*):
128 Deprecated. Size of the dataset in bytes after post-processing, if any.
129 dataset_size (`int`, *optional*):
130 The combined size in bytes of the Arrow tables for all splits.
131 size_in_bytes (`int`, *optional*):
132 The combined size in bytes of all files associated with the dataset (downloaded files + Arrow files).
133 **config_kwargs (additional keyword arguments):
134 Keyword arguments to be passed to the [`BuilderConfig`] and used in the [`DatasetBuilder`].
135 """
136
137 # Set in the dataset builders
138 description: str = dataclasses.field(default_factory=str)
139 citation: str = dataclasses.field(default_factory=str)
140 homepage: str = dataclasses.field(default_factory=str)
141 license: str = dataclasses.field(default_factory=str)
142 features: Optional[Features] = None
143 post_processed: Optional[PostProcessedInfo] = None # kept for bawkard compat
144 supervised_keys: Optional[SupervisedKeysData] = None
145
146 # Set later by the builder
147 builder_name: Optional[str] = None
148 dataset_name: Optional[str] = None # for packaged builders, to be different from builder_name
149 config_name: Optional[str] = None

Calls 1

fieldMethod · 0.80