Converts this :class:`~ray.data.Dataset` into a distributed set of NumPy ndarrays or dictionary of NumPy ndarrays. This is only supported for datasets convertible to NumPy ndarrays. This function induces a copy of the data. For zero-copy access to the underlying data
(
self, *, column: Optional[str] = None
)
| 6836 | |
| 6837 | @DeveloperAPI |
| 6838 | def to_numpy_refs( |
| 6839 | self, *, column: Optional[str] = None |
| 6840 | ) -> List[ObjectRef[np.ndarray]]: |
| 6841 | """Converts this :class:`~ray.data.Dataset` into a distributed set of NumPy |
| 6842 | ndarrays or dictionary of NumPy ndarrays. |
| 6843 | |
| 6844 | This is only supported for datasets convertible to NumPy ndarrays. |
| 6845 | This function induces a copy of the data. For zero-copy access to the |
| 6846 | underlying data, consider using :meth:`Dataset.to_arrow_refs` or |
| 6847 | :meth:`Dataset.iter_internal_ref_bundles`. |
| 6848 | |
| 6849 | Examples: |
| 6850 | >>> import ray |
| 6851 | >>> ds = ray.data.range(10, override_num_blocks=2) |
| 6852 | >>> refs = ds.to_numpy_refs() |
| 6853 | >>> len(refs) |
| 6854 | 2 |
| 6855 | |
| 6856 | Time complexity: O(dataset size / parallelism) |
| 6857 | |
| 6858 | Args: |
| 6859 | column: The name of the column to convert to numpy. If ``None``, all columns |
| 6860 | are used. If multiple columns are specified, each returned |
| 6861 | future represents a dict of ndarrays. Defaults to None. |
| 6862 | |
| 6863 | Returns: |
| 6864 | A list of remote NumPy ndarrays created from this dataset. |
| 6865 | """ |
| 6866 | block_to_ndarray = cached_remote_fn(_block_to_ndarray) |
| 6867 | label_selector = self.context.execution_options.label_selector |
| 6868 | if label_selector: |
| 6869 | block_to_ndarray = block_to_ndarray.options(label_selector=label_selector) |
| 6870 | numpy_refs = [] |
| 6871 | for bundle in self.iter_internal_ref_bundles(): |
| 6872 | for block_ref in bundle.block_refs: |
| 6873 | numpy_refs.append(block_to_ndarray.remote(block_ref, column=column)) |
| 6874 | return numpy_refs |
| 6875 | |
| 6876 | @ConsumptionAPI(pattern="Time complexity:") |
| 6877 | @DeveloperAPI |