MCPcopy
hub / github.com/ray-project/ray / random_shuffle

Method random_shuffle

python/ray/data/dataset.py:1897–1974  ·  view source on GitHub ↗

Randomly shuffle the rows of this :class:`Dataset`. .. tip:: This method can be slow. For better performance, try :ref:`Iterating over batches with shuffling `. Also, see :ref:`Optimizing shuffles <optimizing_shuffl

(
        self,
        *,
        seed: Optional[int | RandomSeedConfig] = None,
        num_blocks: Optional[int] = None,
        **ray_remote_args,
    )

Source from the content-addressed store, hash-verified

1895 @AllToAllAPI
1896 @PublicAPI(api_group=SSR_API_GROUP)
1897 def random_shuffle(
1898 self,
1899 *,
1900 seed: Optional[int | RandomSeedConfig] = None,
1901 num_blocks: Optional[int] = None,
1902 **ray_remote_args,
1903 ) -> "Dataset":
1904 """Randomly shuffle the rows of this :class:`Dataset`.
1905
1906 .. tip::
1907
1908 This method can be slow. For better performance, try
1909 :ref:`Iterating over batches with shuffling <iterating-over-batches-with-shuffling>`.
1910 Also, see :ref:`Optimizing shuffles <optimizing_shuffles>`.
1911
1912 Examples:
1913 >>> import ray
1914 >>> from ray.data import RandomSeedConfig
1915 >>> ds = ray.data.range(100)
1916 >>> ds.random_shuffle().take(3) # doctest: +SKIP
1917 [{'id': 41}, {'id': 21}, {'id': 92}]
1918 >>> ds.random_shuffle(seed=42).take(3) # doctest: +SKIP
1919 [{'id': 24}, {'id': 97}, {'id': 17}]
1920
1921 Fully deterministic across executions:
1922 >>> ds = ray.data.range(100)
1923 >>> ds.random_shuffle(seed=RandomSeedConfig(seed=42, reseed_after_execution=False)).take(3) # doctest: +SKIP
1924 [{'id': 24}, {'id': 97}, {'id': 17}]
1925 >>> ds.random_shuffle(seed=RandomSeedConfig(seed=42, reseed_after_execution=False)).take(3) # doctest: +SKIP
1926 [{'id': 24}, {'id': 97}, {'id': 17}]
1927
1928 Reproducible but non-deterministic across executions (e.g., training epochs):
1929 >>> ds = ray.data.range(100)
1930 >>> ds.random_shuffle(seed=RandomSeedConfig(seed=42, reseed_after_execution=True)).take(3) # doctest: +SKIP
1931 [{'id': 29}, {'id': 79}, {'id': 39}]
1932 >>> ds.random_shuffle(seed=RandomSeedConfig(seed=42, reseed_after_execution=True)).take(3) # doctest: +SKIP
1933 [{'id': 40}, {'id': 7}, {'id': 90}]
1934
1935 Time complexity: O(dataset size / parallelism)
1936
1937 Args:
1938 seed: An optional random seed. Can be an integer or a :class:`RandomSeedConfig`
1939 object. If an integer is provided, it defaults to fully deterministic
1940 behavior (same shuffle order across executions). If None, the shuffle
1941 is non-deterministic. See :class:`RandomSeedConfig` for more details on seed behavior.
1942 num_blocks: This parameter is deprecated. It was previously intended to
1943 specify the number of output blocks in the shuffled dataset, but is no
1944 longer supported. To control the number of output blocks, use
1945 :meth:`Dataset.repartition` after shuffling instead.
1946 **ray_remote_args: Additional resource requirements to request from
1947 Ray (e.g., num_gpus=1 to request GPUs for the map tasks). See
1948 :func:`ray.remote` for details.
1949
1950 Returns:
1951 The shuffled :class:`Dataset`.
1952 """ # noqa: E501
1953
1954 if num_blocks is not None:

Callers 12

train_test_splitMethod · 0.45
run_benchmarkFunction · 0.45
iteratorMethod · 0.45
estimate_on_datasetMethod · 0.45
data_providerFunction · 0.45
get_train_datasetFunction · 0.45

Calls 4

RandomShuffleClass · 0.90
LogicalPlanClass · 0.90
create_seed_configMethod · 0.80
_from_parentMethod · 0.80

Tested by 3