Information about a dataset. `DatasetInfo` documents datasets, including its name, version, and features. See the constructor arguments and properties for a full list. Not all fields are known on construction and may be updated later. Attributes: description (`str`):
| 90 | |
| 91 | @dataclass |
| 92 | class DatasetInfo: |
| 93 | """Information about a dataset. |
| 94 | |
| 95 | `DatasetInfo` documents datasets, including its name, version, and features. |
| 96 | See the constructor arguments and properties for a full list. |
| 97 | |
| 98 | Not all fields are known on construction and may be updated later. |
| 99 | |
| 100 | Attributes: |
| 101 | description (`str`): |
| 102 | A description of the dataset. |
| 103 | citation (`str`): |
| 104 | A BibTeX citation of the dataset. |
| 105 | homepage (`str`): |
| 106 | A URL to the official homepage for the dataset. |
| 107 | license (`str`): |
| 108 | The dataset's license. It can be the name of the license or a paragraph containing the terms of the license. |
| 109 | features ([`Features`], *optional*): |
| 110 | The features used to specify the dataset's column types. |
| 111 | post_processed (`PostProcessedInfo`, *optional*): |
| 112 | Deprecated. Information regarding the resources of a possible post-processing of a dataset. For example, it can contain the information of an index. |
| 113 | supervised_keys (`SupervisedKeysData`, *optional*): |
| 114 | Specifies the input feature and the label for supervised learning if applicable for the dataset (legacy from TFDS). |
| 115 | builder_name (`str`, *optional*): |
| 116 | The name of the `GeneratorBasedBuilder` subclass used to create the dataset. It is also the snake_case version of the dataset builder class name. |
| 117 | config_name (`str`, *optional*): |
| 118 | The name of the configuration derived from [`BuilderConfig`]. |
| 119 | version (`str` or [`Version`], *optional*): |
| 120 | The version of the dataset. |
| 121 | splits (`dict`, *optional*): |
| 122 | The mapping between split name and metadata. |
| 123 | download_checksums (`dict`, *optional*): |
| 124 | The mapping between the URL to download the dataset's checksums and corresponding metadata. |
| 125 | download_size (`int`, *optional*): |
| 126 | The size of the files to download to generate the dataset, in bytes. |
| 127 | post_processing_size (`int`, *optional*): |
| 128 | Deprecated. Size of the dataset in bytes after post-processing, if any. |
| 129 | dataset_size (`int`, *optional*): |
| 130 | The combined size in bytes of the Arrow tables for all splits. |
| 131 | size_in_bytes (`int`, *optional*): |
| 132 | The combined size in bytes of all files associated with the dataset (downloaded files + Arrow files). |
| 133 | **config_kwargs (additional keyword arguments): |
| 134 | Keyword arguments to be passed to the [`BuilderConfig`] and used in the [`DatasetBuilder`]. |
| 135 | """ |
| 136 | |
| 137 | # Set in the dataset builders |
| 138 | description: str = dataclasses.field(default_factory=str) |
| 139 | citation: str = dataclasses.field(default_factory=str) |
| 140 | homepage: str = dataclasses.field(default_factory=str) |
| 141 | license: str = dataclasses.field(default_factory=str) |
| 142 | features: Optional[Features] = None |
| 143 | post_processed: Optional[PostProcessedInfo] = None # kept for bawkard compat |
| 144 | supervised_keys: Optional[SupervisedKeysData] = None |
| 145 | |
| 146 | # Set later by the builder |
| 147 | builder_name: Optional[str] = None |
| 148 | dataset_name: Optional[str] = None # for packaged builders, to be different from builder_name |
| 149 | config_name: Optional[str] = None |