Creates a :class:`~ray.data.Dataset` from a range of integers [0..n). This function allows for easy creation of synthetic datasets for testing or benchmarking :ref:`Ray Data `. The column name defaults to "id". Examples: >>> import ray >>> ds = ray.data.range(100
(
n: int,
*,
parallelism: int = -1,
concurrency: Optional[int] = None,
override_num_blocks: Optional[int] = None,
)
| 256 | |
| 257 | @PublicAPI |
| 258 | def range( |
| 259 | n: int, |
| 260 | *, |
| 261 | parallelism: int = -1, |
| 262 | concurrency: Optional[int] = None, |
| 263 | override_num_blocks: Optional[int] = None, |
| 264 | ) -> Dataset: |
| 265 | """Creates a :class:`~ray.data.Dataset` from a range of integers [0..n). |
| 266 | |
| 267 | This function allows for easy creation of synthetic datasets for testing or |
| 268 | benchmarking :ref:`Ray Data <data>`. The column name defaults to "id". |
| 269 | |
| 270 | Examples: |
| 271 | |
| 272 | >>> import ray |
| 273 | >>> ds = ray.data.range(10000) |
| 274 | >>> ds # doctest: +ELLIPSIS |
| 275 | shape: (10000, 1) |
| 276 | ╭───────╮ |
| 277 | │ id │ |
| 278 | │ --- │ |
| 279 | │ int64 │ |
| 280 | ╰───────╯ |
| 281 | (Dataset isn't materialized) |
| 282 | >>> ds.map(lambda row: {"id": row["id"] * 2}).take(4) |
| 283 | [{'id': 0}, {'id': 2}, {'id': 4}, {'id': 6}] |
| 284 | |
| 285 | Args: |
| 286 | n: The upper bound of the range of integers. |
| 287 | parallelism: This argument is deprecated. Use ``override_num_blocks`` argument. |
| 288 | concurrency: The maximum number of Ray tasks to run concurrently. Set this |
| 289 | to control number of tasks to run concurrently. This doesn't change the |
| 290 | total number of tasks run or the total number of output blocks. By default, |
| 291 | concurrency is dynamically decided based on the available resources. |
| 292 | override_num_blocks: Override the number of output blocks from all read tasks. |
| 293 | By default, the number of output blocks is dynamically decided based on |
| 294 | input data size and available resources. You shouldn't manually set this |
| 295 | value in most cases. |
| 296 | |
| 297 | Returns: |
| 298 | A :class:`~ray.data.Dataset` producing the integers from the range 0 to n. |
| 299 | |
| 300 | .. seealso:: |
| 301 | |
| 302 | :meth:`~ray.data.range_tensor` |
| 303 | Call this method for creating synthetic datasets of tensor data. |
| 304 | |
| 305 | """ |
| 306 | datasource = RangeDatasource(n=n, block_format="arrow", column_name="id") |
| 307 | return read_datasource( |
| 308 | datasource, |
| 309 | parallelism=parallelism, |
| 310 | concurrency=concurrency, |
| 311 | override_num_blocks=override_num_blocks, |
| 312 | ) |
| 313 | |
| 314 | |
| 315 | @PublicAPI |
searching dependent graphs…