A special dictionary that defines the internal structure of a dataset. Instantiated with a dictionary of type `dict[str, FieldType]`, where keys are the desired column names, and values are the type of that column. `FieldType` can be one of the following: - [`Value`] feature sp
| 1845 | |
| 1846 | |
| 1847 | class Features(dict): |
| 1848 | """A special dictionary that defines the internal structure of a dataset. |
| 1849 | |
| 1850 | Instantiated with a dictionary of type `dict[str, FieldType]`, where keys are the desired column names, |
| 1851 | and values are the type of that column. |
| 1852 | |
| 1853 | `FieldType` can be one of the following: |
| 1854 | - [`Value`] feature specifies a single data type value, e.g. `int64` or `string`. |
| 1855 | - [`ClassLabel`] feature specifies a predefined set of classes which can have labels associated to them and |
| 1856 | will be stored as integers in the dataset. |
| 1857 | - Python `dict` specifies a composite feature containing a mapping of sub-fields to sub-features. |
| 1858 | It's possible to have nested fields of nested fields in an arbitrary manner. |
| 1859 | - [`List`] or [`LargeList`] specifies a composite feature containing a sequence of |
| 1860 | sub-features, all of the same feature type. |
| 1861 | - [`Array2D`], [`Array3D`], [`Array4D`] or [`Array5D`] feature for multidimensional arrays. |
| 1862 | - [`Audio`] feature to store the absolute path to an audio file or a dictionary with the relative path |
| 1863 | to an audio file ("path" key) and its bytes content ("bytes" key). |
| 1864 | This feature loads the audio lazily with a decoder. |
| 1865 | - [`Image`] feature to store the absolute path to an image file, an `np.ndarray` object, a `PIL.Image.Image` object |
| 1866 | or a dictionary with the relative path to an image file ("path" key) and its bytes content ("bytes" key). |
| 1867 | This feature extracts the image data. |
| 1868 | - [`Video`] feature to store the absolute path to a video file, a `torchcodec.decoders.VideoDecoder` object |
| 1869 | or a dictionary with the relative path to a video file ("path" key) and its bytes content ("bytes" key). |
| 1870 | This feature loads the video lazily with a decoder. |
| 1871 | - [`Pdf`] feature to store the absolute path to a PDF file, a `pdfplumber.pdf.PDF` object |
| 1872 | or a dictionary with the relative path to a PDF file ("path" key) and its bytes content ("bytes" key). |
| 1873 | This feature loads the PDF lazily with a PDF reader. |
| 1874 | - [`Nifti`] feature to store the absolute path to a NIfTI neuroimaging file, a `nibabel.Nifti1Image` object |
| 1875 | or a dictionary with the relative path to a NIfTI file ("path" key) and its bytes content ("bytes" key). |
| 1876 | This feature loads the NIfTI file lazily with nibabel. |
| 1877 | - [`Translation`] or [`TranslationVariableLanguages`] feature specific to Machine Translation. |
| 1878 | - [`Json`] feature to store unstructred data, e.g. containing mixed/abritrary types. Under the hood |
| 1879 | """ |
| 1880 | |
| 1881 | def __init__(*args, **kwargs): |
| 1882 | # self not in the signature to allow passing self as a kwarg |
| 1883 | if not args: |
| 1884 | raise TypeError("descriptor '__init__' of 'Features' object needs an argument") |
| 1885 | self, *args = args |
| 1886 | super(Features, self).__init__(*args, **kwargs) |
| 1887 | # keep track of columns which require decoding |
| 1888 | self._column_requires_decoding: dict[str, bool] = { |
| 1889 | col: require_decoding(feature) for col, feature in self.items() |
| 1890 | } |
| 1891 | |
| 1892 | # backward compatibility with datasets<4 : [feature] -> List(feature) |
| 1893 | def _check_old_list(feature): |
| 1894 | if isinstance(feature, list): |
| 1895 | return List(_visit(feature[0], _check_old_list)) |
| 1896 | return feature |
| 1897 | |
| 1898 | for column_name, feature in self.items(): |
| 1899 | self[column_name] = _visit(feature, _check_old_list) |
| 1900 | |
| 1901 | __setitem__ = keep_features_dicts_synced(dict.__setitem__) |
| 1902 | __delitem__ = keep_features_dicts_synced(dict.__delitem__) |
| 1903 | update = keep_features_dicts_synced(dict.update) |
| 1904 | setdefault = keep_features_dicts_synced(dict.setdefault) |