hub / github.com/ray-project/ray / DataContext

Class DataContext

python/ray/data/context.py:479–1170 · view source on GitHub ↗

Global settings for Ray Data. Configure this class to enable advanced features and tune performance. .. warning:: Apply changes before creating a :class:`~ray.data.Dataset`. Changes made after won't take effect. .. note:: This object is automatically propagated

Source from the content-addressed store, hash-verified

477	@DeveloperAPI
478	@dataclass
479	class DataContext:
480	"""Global settings for Ray Data.
481
482	Configure this class to enable advanced features and tune performance.
483
484	.. warning::
485	Apply changes before creating a :class:`~ray.data.Dataset`. Changes made after
486	won't take effect.
487
488	.. note::
489	This object is automatically propagated to workers. Access it from the driver
490	and remote workers with :meth:`DataContext.get_current()`.
491
492	Examples:
493	>>> from ray.data import DataContext
494	>>> DataContext.get_current().enable_progress_bars = False
495
496	Args:
497	target_max_block_size: The max target block size in bytes for reads and
498	transformations. If `None`, this means the block size is infinite.
499	target_min_block_size: Ray Data avoids creating blocks smaller than this
500	size in bytes on read. This takes precedence over
501	``read_op_min_num_blocks``.
502	streaming_read_buffer_size: Buffer size when doing streaming reads from local or
503	remote storage.
504	enable_pandas_block: Whether pandas block format is enabled.
505	actor_prefetcher_enabled: Whether to use actor based block prefetcher.
506	autoscaling_config: Autoscaling configuration.
507	use_push_based_shuffle: Whether to use push-based shuffle.
508	pipeline_push_based_shuffle_reduce_tasks:
509	scheduling_strategy: The global scheduling strategy. For tasks with large args,
510	``scheduling_strategy_large_args`` takes precedence.
511	scheduling_strategy_large_args: Scheduling strategy for tasks with large args.
512	large_args_threshold: Size in bytes after which point task arguments are
513	considered large. Choose a value so that the data transfer overhead is
514	significant in comparison to task scheduling (i.e., low tens of ms).
515	use_polars: Whether to use Polars for tabular dataset sorts, groupbys, and
516	aggregations.
517	eager_free: Whether to eagerly free memory.
518	decoding_size_estimation: Whether to estimate in-memory decoding data size for
519	data source.
520	min_parallelism: This setting is deprecated. Use ``read_op_min_num_blocks``
521	instead.
522	read_op_min_num_blocks: Minimum number of read output blocks for a dataset.
523	use_datasource_v2: When True, ``ray.data.read_parquet()`` routes through
524	the DataSourceV2 pipeline (``ListFiles → ReadFiles`` logical chain,
525	driver-side first-file sampling for schema inference,
526	``ParquetScanner`` / ``ParquetFileReader``). Defaults to False — V1
527	remains the production path while V2 bakes.
528	parquet_chunker_target_chunk_size: Target chunk size in bytes used by
529	``ParquetFileChunker`` when splitting large Parquet files into
530	multiple read tasks. When ``None``, the chunker's built-in default
531	(currently 1 GiB) is used.
532	enable_tensor_extension_casting: Whether to automatically cast NumPy ndarray
533	columns in Pandas DataFrames to tensor extension columns.
534	arrow_fixed_shape_tensor_format: The tensor format to use for fixed-shape tensors.
535	Options are FixedShapeTensorFormat.V1, FixedShapeTensorFormat.V2, and FixedShapeTensorFormat.ARROW_NATIVE.
536	Default is V2. NOTE: For ARROW_NATIVE, only numbers (integers, floats) are currently supported.

Callers 15

_make_data_contextFunction · 0.90

test_gpu_shuffle_default_valuesMethod · 0.90

test_gpu_shuffle_fields_settableMethod · 0.90

test_explicit_count_usedMethod · 0.90

test_auto_detect_from_clusterMethod · 0.90

test_zero_gpus_raisesMethod · 0.90

test_fractional_gpu_count_truncatedMethod · 0.90

test_gpu_shuffle_routes_to_gpu_operatorMethod · 0.90

test_hash_shuffle_still_routes_to_hash_operatorMethod · 0.90

test_unsupported_strategy_with_keys_raisesMethod · 0.90

test_gpu_shuffle_respects_num_outputsMethod · 0.90

test_gpu_shuffle_key_columns_normalisedMethod · 0.90

Calls 2

_deduce_default_shuffle_algorithmFunction · 0.85

listFunction · 0.85

Tested by 15

_make_data_contextFunction · 0.72

test_gpu_shuffle_default_valuesMethod · 0.72

test_gpu_shuffle_fields_settableMethod · 0.72

test_explicit_count_usedMethod · 0.72

test_auto_detect_from_clusterMethod · 0.72

test_zero_gpus_raisesMethod · 0.72

test_fractional_gpu_count_truncatedMethod · 0.72

test_gpu_shuffle_routes_to_gpu_operatorMethod · 0.72

test_hash_shuffle_still_routes_to_hash_operatorMethod · 0.72

test_unsupported_strategy_with_keys_raisesMethod · 0.72

test_gpu_shuffle_respects_num_outputsMethod · 0.72

test_gpu_shuffle_key_columns_normalisedMethod · 0.72

Used in the wild real call sites across dependent graphs

searching dependent graphs…