MCPcopy
hub / github.com/ray-project/ray / DataContext

Class DataContext

python/ray/data/context.py:479–1170  ·  view source on GitHub ↗

Global settings for Ray Data. Configure this class to enable advanced features and tune performance. .. warning:: Apply changes before creating a :class:`~ray.data.Dataset`. Changes made after won't take effect. .. note:: This object is automatically propagated

Source from the content-addressed store, hash-verified

477@DeveloperAPI
478@dataclass
479class DataContext:
480 """Global settings for Ray Data.
481
482 Configure this class to enable advanced features and tune performance.
483
484 .. warning::
485 Apply changes before creating a :class:`~ray.data.Dataset`. Changes made after
486 won't take effect.
487
488 .. note::
489 This object is automatically propagated to workers. Access it from the driver
490 and remote workers with :meth:`DataContext.get_current()`.
491
492 Examples:
493 >>> from ray.data import DataContext
494 >>> DataContext.get_current().enable_progress_bars = False
495
496 Args:
497 target_max_block_size: The max target block size in bytes for reads and
498 transformations. If `None`, this means the block size is infinite.
499 target_min_block_size: Ray Data avoids creating blocks smaller than this
500 size in bytes on read. This takes precedence over
501 ``read_op_min_num_blocks``.
502 streaming_read_buffer_size: Buffer size when doing streaming reads from local or
503 remote storage.
504 enable_pandas_block: Whether pandas block format is enabled.
505 actor_prefetcher_enabled: Whether to use actor based block prefetcher.
506 autoscaling_config: Autoscaling configuration.
507 use_push_based_shuffle: Whether to use push-based shuffle.
508 pipeline_push_based_shuffle_reduce_tasks:
509 scheduling_strategy: The global scheduling strategy. For tasks with large args,
510 ``scheduling_strategy_large_args`` takes precedence.
511 scheduling_strategy_large_args: Scheduling strategy for tasks with large args.
512 large_args_threshold: Size in bytes after which point task arguments are
513 considered large. Choose a value so that the data transfer overhead is
514 significant in comparison to task scheduling (i.e., low tens of ms).
515 use_polars: Whether to use Polars for tabular dataset sorts, groupbys, and
516 aggregations.
517 eager_free: Whether to eagerly free memory.
518 decoding_size_estimation: Whether to estimate in-memory decoding data size for
519 data source.
520 min_parallelism: This setting is deprecated. Use ``read_op_min_num_blocks``
521 instead.
522 read_op_min_num_blocks: Minimum number of read output blocks for a dataset.
523 use_datasource_v2: When True, ``ray.data.read_parquet()`` routes through
524 the DataSourceV2 pipeline (``ListFiles → ReadFiles`` logical chain,
525 driver-side first-file sampling for schema inference,
526 ``ParquetScanner`` / ``ParquetFileReader``). Defaults to False — V1
527 remains the production path while V2 bakes.
528 parquet_chunker_target_chunk_size: Target chunk size in bytes used by
529 ``ParquetFileChunker`` when splitting large Parquet files into
530 multiple read tasks. When ``None``, the chunker's built-in default
531 (currently 1 GiB) is used.
532 enable_tensor_extension_casting: Whether to automatically cast NumPy ndarray
533 columns in Pandas DataFrames to tensor extension columns.
534 arrow_fixed_shape_tensor_format: The tensor format to use for fixed-shape tensors.
535 Options are FixedShapeTensorFormat.V1, FixedShapeTensorFormat.V2, and FixedShapeTensorFormat.ARROW_NATIVE.
536 Default is V2. NOTE: For ARROW_NATIVE, only numbers (integers, floats) are currently supported.

Calls 2

listFunction · 0.85

Used in the wild real call sites across dependent graphs

searching dependent graphs…