hub / github.com/ray-project/ray / random_shuffle

Method random_shuffle

python/ray/data/dataset.py:1897–1974 · view source on GitHub ↗

Randomly shuffle the rows of this :class:`Dataset`. .. tip:: This method can be slow. For better performance, try :ref:`Iterating over batches with shuffling `. Also, see :ref:`Optimizing shuffles <optimizing_shuffl

(
        self,
        *,
        seed: Optional[int | RandomSeedConfig] = None,
        num_blocks: Optional[int] = None,
        **ray_remote_args,
    )

Source from the content-addressed store, hash-verified

1895	@AllToAllAPI
1896	@PublicAPI(api_group=SSR_API_GROUP)
1897	def random_shuffle(
1898	self,
1899	*,
1900	seed: Optional[int \| RandomSeedConfig] = None,
1901	num_blocks: Optional[int] = None,
1902	**ray_remote_args,
1903	) -> "Dataset":
1904	"""Randomly shuffle the rows of this :class:`Dataset`.
1905
1906	.. tip::
1907
1908	This method can be slow. For better performance, try
1909	:ref:`Iterating over batches with shuffling <iterating-over-batches-with-shuffling>`.
1910	Also, see :ref:`Optimizing shuffles <optimizing_shuffles>`.
1911
1912	Examples:
1913	>>> import ray
1914	>>> from ray.data import RandomSeedConfig
1915	>>> ds = ray.data.range(100)
1916	>>> ds.random_shuffle().take(3) # doctest: +SKIP
1917	[{'id': 41}, {'id': 21}, {'id': 92}]
1918	>>> ds.random_shuffle(seed=42).take(3) # doctest: +SKIP
1919	[{'id': 24}, {'id': 97}, {'id': 17}]
1920
1921	Fully deterministic across executions:
1922	>>> ds = ray.data.range(100)
1923	>>> ds.random_shuffle(seed=RandomSeedConfig(seed=42, reseed_after_execution=False)).take(3) # doctest: +SKIP
1924	[{'id': 24}, {'id': 97}, {'id': 17}]
1925	>>> ds.random_shuffle(seed=RandomSeedConfig(seed=42, reseed_after_execution=False)).take(3) # doctest: +SKIP
1926	[{'id': 24}, {'id': 97}, {'id': 17}]
1927
1928	Reproducible but non-deterministic across executions (e.g., training epochs):
1929	>>> ds = ray.data.range(100)
1930	>>> ds.random_shuffle(seed=RandomSeedConfig(seed=42, reseed_after_execution=True)).take(3) # doctest: +SKIP
1931	[{'id': 29}, {'id': 79}, {'id': 39}]
1932	>>> ds.random_shuffle(seed=RandomSeedConfig(seed=42, reseed_after_execution=True)).take(3) # doctest: +SKIP
1933	[{'id': 40}, {'id': 7}, {'id': 90}]
1934
1935	Time complexity: O(dataset size / parallelism)
1936
1937	Args:
1938	seed: An optional random seed. Can be an integer or a :class:`RandomSeedConfig`
1939	object. If an integer is provided, it defaults to fully deterministic
1940	behavior (same shuffle order across executions). If None, the shuffle
1941	is non-deterministic. See :class:`RandomSeedConfig` for more details on seed behavior.
1942	num_blocks: This parameter is deprecated. It was previously intended to
1943	specify the number of output blocks in the shuffled dataset, but is no
1944	longer supported. To control the number of output blocks, use
1945	:meth:`Dataset.repartition` after shuffling instead.
1946	**ray_remote_args: Additional resource requirements to request from
1947	Ray (e.g., num_gpus=1 to request GPUs for the map tasks). See
1948	:func:`ray.remote` for details.
1949
1950	Returns:
1951	The shuffled :class:`Dataset`.
1952	""" # noqa: E501
1953
1954	if num_blocks is not None:

Callers 12

train_test_splitMethod · 0.45

test_per_epoch_preprocessingFunction · 0.45

test_materialized_preprocessingFunction · 0.45

test_per_epoch_preprocessingFunction · 0.45

run_benchmarkFunction · 0.45

iteratorMethod · 0.45

estimate_on_datasetMethod · 0.45

data_providerFunction · 0.45

get_train_datasetFunction · 0.45

04d1_generative_cv_pattern.pyFile · 0.45

04d2_policy_learning_pattern.pyFile · 0.45

04b_tabular_workload_pattern.pyFile · 0.45

Calls 4

RandomShuffleClass · 0.90

LogicalPlanClass · 0.90

create_seed_configMethod · 0.80

_from_parentMethod · 0.80

Tested by 3

test_per_epoch_preprocessingFunction · 0.36

test_materialized_preprocessingFunction · 0.36

test_per_epoch_preprocessingFunction · 0.36