MCPcopy
hub / github.com/ray-project/ray / write_numpy

Method write_numpy

python/ray/data/dataset.py:5094–5191  ·  view source on GitHub ↗

Writes a column of the :class:`~ray.data.Dataset` to .npy files. This is only supported for columns in the datasets that can be converted to NumPy arrays. The number of files is determined by the number of blocks in the dataset. To control the number of number of bl

(
        self,
        path: str,
        *,
        column: str,
        filesystem: Optional["pyarrow.fs.FileSystem"] = None,
        try_create_dir: bool = True,
        arrow_open_stream_args: Optional[Dict[str, Any]] = None,
        filename_provider: Optional[FilenameProvider] = None,
        min_rows_per_file: Optional[int] = None,
        ray_remote_args: Dict[str, Any] = None,
        concurrency: Optional[int] = None,
        num_rows_per_file: Optional[int] = None,
        mode: SaveMode = SaveMode.APPEND,
    )

Source from the content-addressed store, hash-verified

5092 @ConsumptionAPI
5093 @PublicAPI(api_group=IOC_API_GROUP)
5094 def write_numpy(
5095 self,
5096 path: str,
5097 *,
5098 column: str,
5099 filesystem: Optional["pyarrow.fs.FileSystem"] = None,
5100 try_create_dir: bool = True,
5101 arrow_open_stream_args: Optional[Dict[str, Any]] = None,
5102 filename_provider: Optional[FilenameProvider] = None,
5103 min_rows_per_file: Optional[int] = None,
5104 ray_remote_args: Dict[str, Any] = None,
5105 concurrency: Optional[int] = None,
5106 num_rows_per_file: Optional[int] = None,
5107 mode: SaveMode = SaveMode.APPEND,
5108 ) -> None:
5109 """Writes a column of the :class:`~ray.data.Dataset` to .npy files.
5110
5111 This is only supported for columns in the datasets that can be converted to
5112 NumPy arrays.
5113
5114 The number of files is determined by the number of blocks in the dataset.
5115 To control the number of number of blocks, call
5116 :meth:`~ray.data.Dataset.repartition`.
5117
5118
5119 By default, the format of the output files is ``{uuid}_{block_idx}.npy``,
5120 where ``uuid`` is a unique id for the dataset. To modify this behavior,
5121 implement a custom :class:`~ray.data.datasource.FilenameProvider`
5122 and pass it in as the ``filename_provider`` argument.
5123
5124 Examples:
5125 >>> import ray
5126 >>> ds = ray.data.range(100)
5127 >>> ds.write_numpy("local:///tmp/data/", column="id")
5128
5129 Time complexity: O(dataset size / parallelism)
5130
5131 Args:
5132 path: The path to the destination root directory, where
5133 the npy files are written to.
5134 column: The name of the column that contains the data to
5135 be written.
5136 filesystem: The pyarrow filesystem implementation to write to.
5137 These filesystems are specified in the
5138 `pyarrow docs <https://arrow.apache.org/docs\
5139 /python/api/filesystems.html#filesystem-implementations>`_.
5140 Specify this if you need to provide specific configurations to the
5141 filesystem. By default, the filesystem is automatically selected based
5142 on the scheme of the paths. For example, if the path begins with
5143 ``s3://``, the ``S3FileSystem`` is used.
5144 try_create_dir: If ``True``, attempts to create all directories in
5145 destination path. Does nothing if all directories already
5146 exist. Defaults to ``True``.
5147 arrow_open_stream_args: kwargs passed to
5148 `pyarrow.fs.FileSystem.open_output_stream <https://arrow.apache.org\
5149 /docs/python/generated/pyarrow.fs.FileSystem.html\
5150 #pyarrow.fs.FileSystem.open_output_stream>`_, which is used when
5151 opening the file to write to.

Callers 2

test_numpy_roundtripFunction · 0.80
test_numpy_writeFunction · 0.80

Calls 3

write_datasinkMethod · 0.95
NumpyDatasinkClass · 0.90

Tested by 2

test_numpy_roundtripFunction · 0.64
test_numpy_writeFunction · 0.64