hub / github.com/ray-project/ray / range

Function range

python/ray/data/read_api.py:258–312 · view source on GitHub ↗

Creates a :class:`~ray.data.Dataset` from a range of integers [0..n). This function allows for easy creation of synthetic datasets for testing or benchmarking :ref:`Ray Data `. The column name defaults to "id". Examples: >>> import ray >>> ds = ray.data.range(100

(
    n: int,
    *,
    parallelism: int = -1,
    concurrency: Optional[int] = None,
    override_num_blocks: Optional[int] = None,
)

Source from the content-addressed store, hash-verified

256
257	@PublicAPI
258	def range(
259	n: int,
260	*,
261	parallelism: int = -1,
262	concurrency: Optional[int] = None,
263	override_num_blocks: Optional[int] = None,
264	) -> Dataset:
265	"""Creates a :class:`~ray.data.Dataset` from a range of integers [0..n).
266
267	This function allows for easy creation of synthetic datasets for testing or
268	benchmarking :ref:`Ray Data <data>`. The column name defaults to "id".
269
270	Examples:
271
272	>>> import ray
273	>>> ds = ray.data.range(10000)
274	>>> ds # doctest: +ELLIPSIS
275	shape: (10000, 1)
276	╭───────╮
277	│ id │
278	│ --- │
279	│ int64 │
280	╰───────╯
281	(Dataset isn't materialized)
282	>>> ds.map(lambda row: {"id": row["id"] * 2}).take(4)
283	[{'id': 0}, {'id': 2}, {'id': 4}, {'id': 6}]
284
285	Args:
286	n: The upper bound of the range of integers.
287	parallelism: This argument is deprecated. Use ``override_num_blocks`` argument.
288	concurrency: The maximum number of Ray tasks to run concurrently. Set this
289	to control number of tasks to run concurrently. This doesn't change the
290	total number of tasks run or the total number of output blocks. By default,
291	concurrency is dynamically decided based on the available resources.
292	override_num_blocks: Override the number of output blocks from all read tasks.
293	By default, the number of output blocks is dynamically decided based on
294	input data size and available resources. You shouldn't manually set this
295	value in most cases.
296
297	Returns:
298	A :class:`~ray.data.Dataset` producing the integers from the range 0 to n.
299
300	.. seealso::
301
302	:meth:`~ray.data.range_tensor`
303	Call this method for creating synthetic datasets of tensor data.
304
305	"""
306	datasource = RangeDatasource(n=n, block_format="arrow", column_name="id")
307	return read_datasource(
308	datasource,
309	parallelism=parallelism,
310	concurrency=concurrency,
311	override_num_blocks=override_num_blocks,
312	)
313
314
315	@PublicAPI

Callers 15

__init__Method · 0.70

splitMethod · 0.70

split_proportionatelyMethod · 0.70

_format_statsFunction · 0.70

_bindMethod · 0.50

call_with_retryFunction · 0.50

recover_argsFunction · 0.50

test_signal_actor_multiple_waitersFunction · 0.50

test_semaphore_concurrentFunction · 0.50

get_rayllm_testing_modelFunction · 0.50

testing_multiple_modelsFunction · 0.50

test_sort_bundlesFunction · 0.50

Calls 2

RangeDatasourceClass · 0.90

read_datasourceFunction · 0.85

Tested by 15

test_signal_actor_multiple_waitersFunction · 0.40

test_semaphore_concurrentFunction · 0.40

get_rayllm_testing_modelFunction · 0.40

testing_multiple_modelsFunction · 0.40

test_sort_bundlesFunction · 0.40

test_summary_emitted_after_cooldownMethod · 0.40

test_reset_after_quiet_period_logs_full_tracebackMethod · 0.40

test_non_fatal_500_logs_every_callMethod · 0.40

test_non_fatal_4xx_logs_every_callMethod · 0.40

test_cooldown_resets_countMethod · 0.40

get_expected_contentMethod · 0.40

test_get_next_session_incrementsMethod · 0.40

Used in the wild real call sites across dependent graphs

searching dependent graphs…