MCPcopy
hub / github.com/huggingface/datasets / Features

Class Features

src/datasets/features/features.py:1847–2388  ·  view source on GitHub ↗

A special dictionary that defines the internal structure of a dataset. Instantiated with a dictionary of type `dict[str, FieldType]`, where keys are the desired column names, and values are the type of that column. `FieldType` can be one of the following: - [`Value`] feature sp

Source from the content-addressed store, hash-verified

1845
1846
1847class Features(dict):
1848 """A special dictionary that defines the internal structure of a dataset.
1849
1850 Instantiated with a dictionary of type `dict[str, FieldType]`, where keys are the desired column names,
1851 and values are the type of that column.
1852
1853 `FieldType` can be one of the following:
1854 - [`Value`] feature specifies a single data type value, e.g. `int64` or `string`.
1855 - [`ClassLabel`] feature specifies a predefined set of classes which can have labels associated to them and
1856 will be stored as integers in the dataset.
1857 - Python `dict` specifies a composite feature containing a mapping of sub-fields to sub-features.
1858 It's possible to have nested fields of nested fields in an arbitrary manner.
1859 - [`List`] or [`LargeList`] specifies a composite feature containing a sequence of
1860 sub-features, all of the same feature type.
1861 - [`Array2D`], [`Array3D`], [`Array4D`] or [`Array5D`] feature for multidimensional arrays.
1862 - [`Audio`] feature to store the absolute path to an audio file or a dictionary with the relative path
1863 to an audio file ("path" key) and its bytes content ("bytes" key).
1864 This feature loads the audio lazily with a decoder.
1865 - [`Image`] feature to store the absolute path to an image file, an `np.ndarray` object, a `PIL.Image.Image` object
1866 or a dictionary with the relative path to an image file ("path" key) and its bytes content ("bytes" key).
1867 This feature extracts the image data.
1868 - [`Video`] feature to store the absolute path to a video file, a `torchcodec.decoders.VideoDecoder` object
1869 or a dictionary with the relative path to a video file ("path" key) and its bytes content ("bytes" key).
1870 This feature loads the video lazily with a decoder.
1871 - [`Pdf`] feature to store the absolute path to a PDF file, a `pdfplumber.pdf.PDF` object
1872 or a dictionary with the relative path to a PDF file ("path" key) and its bytes content ("bytes" key).
1873 This feature loads the PDF lazily with a PDF reader.
1874 - [`Nifti`] feature to store the absolute path to a NIfTI neuroimaging file, a `nibabel.Nifti1Image` object
1875 or a dictionary with the relative path to a NIfTI file ("path" key) and its bytes content ("bytes" key).
1876 This feature loads the NIfTI file lazily with nibabel.
1877 - [`Translation`] or [`TranslationVariableLanguages`] feature specific to Machine Translation.
1878 - [`Json`] feature to store unstructred data, e.g. containing mixed/abritrary types. Under the hood
1879 """
1880
1881 def __init__(*args, **kwargs):
1882 # self not in the signature to allow passing self as a kwarg
1883 if not args:
1884 raise TypeError("descriptor '__init__' of 'Features' object needs an argument")
1885 self, *args = args
1886 super(Features, self).__init__(*args, **kwargs)
1887 # keep track of columns which require decoding
1888 self._column_requires_decoding: dict[str, bool] = {
1889 col: require_decoding(feature) for col, feature in self.items()
1890 }
1891
1892 # backward compatibility with datasets<4 : [feature] -> List(feature)
1893 def _check_old_list(feature):
1894 if isinstance(feature, list):
1895 return List(_visit(feature[0], _check_old_list))
1896 return feature
1897
1898 for column_name, feature in self.items():
1899 self[column_name] = _visit(feature, _check_old_list)
1900
1901 __setitem__ = keep_features_dicts_synced(dict.__setitem__)
1902 __delitem__ = keep_features_dicts_synced(dict.__delitem__)
1903 update = keep_features_dicts_synced(dict.update)
1904 setdefault = keep_features_dicts_synced(dict.setdefault)

Callers 15

_create_complex_featuresFunction · 0.90
_create_dummy_datasetMethod · 0.90
test_dummy_datasetMethod · 0.90
test_castMethod · 0.90
test_flattenMethod · 0.90
test_mapMethod · 0.90

Calls 1

Tested by 15

_create_dummy_datasetMethod · 0.72
test_dummy_datasetMethod · 0.72
test_castMethod · 0.72
test_flattenMethod · 0.72
test_mapMethod · 0.72
test_map_new_featuresMethod · 0.72