MCPcopy
hub / github.com/ray-project/ray / randomize_block_order

Method randomize_block_order

python/ray/data/dataset.py:1978–2026  ·  view source on GitHub ↗

Randomly shuffle the :ref:`blocks ` of this :class:`Dataset`. This method is useful if you :meth:`~Dataset.split` your dataset into shards and want to randomize the data in each shard without performing a full :meth:`~Dataset.random_shuffle`. Exampl

(
        self,
        *,
        seed: Optional[int | RandomSeedConfig] = None,
    )

Source from the content-addressed store, hash-verified

1976 @AllToAllAPI
1977 @PublicAPI(api_group=SSR_API_GROUP)
1978 def randomize_block_order(
1979 self,
1980 *,
1981 seed: Optional[int | RandomSeedConfig] = None,
1982 ) -> "Dataset":
1983 """Randomly shuffle the :ref:`blocks <dataset_concept>` of this :class:`Dataset`.
1984
1985 This method is useful if you :meth:`~Dataset.split` your dataset into shards and
1986 want to randomize the data in each shard without performing a full
1987 :meth:`~Dataset.random_shuffle`.
1988
1989 Examples:
1990 >>> import ray
1991 >>> ds = ray.data.range(100)
1992 >>> ds.take(5)
1993 [{'id': 0}, {'id': 1}, {'id': 2}, {'id': 3}, {'id': 4}]
1994 >>> ds.randomize_block_order().take(5) # doctest: +SKIP
1995 {'id': 15}, {'id': 16}, {'id': 17}, {'id': 18}, {'id': 19}]
1996 >>> ds.randomize_block_order(seed=RandomSeedConfig(seed=42, reseed_after_execution=False)).take(5) # doctest: +SKIP
1997 [{'id': 44}, {'id': 45}, {'id': 46}, {'id': 47}, {'id': 80}]
1998 >>> ds.randomize_block_order(seed=RandomSeedConfig(seed=42, reseed_after_execution=False)).take(5) # doctest: +SKIP
1999 [{'id': 44}, {'id': 45}, {'id': 46}, {'id': 47}, {'id': 80}]
2000
2001 Reproducible but non-deterministic across executions (e.g., training epochs):
2002 >>> ds = ray.data.range(100)
2003 >>> ds.randomize_block_order(seed=RandomSeedConfig(seed=42, reseed_after_execution=True)).take(5) # doctest: +SKIP
2004 [{'id': 40}, {'id': 41}, {'id': 42}, {'id': 43}, {'id': 28}]
2005 >>> ds.randomize_block_order(seed=RandomSeedConfig(seed=42, reseed_after_execution=True)).take(5) # doctest: +SKIP
2006 [{'id': 92}, {'id': 93}, {'id': 94}, {'id': 95}, {'id': 88}]
2007
2008 Args:
2009 seed: An optional random seed. Can be an integer or a :class:`RandomSeedConfig`
2010 object. If an integer is provided, it defaults to fully deterministic
2011 behavior (same block order across executions). If None, the block
2012 order is non-deterministic. See :class:`RandomSeedConfig` for more details on
2013 seed behavior.
2014
2015 Returns:
2016 The block-shuffled :class:`Dataset`.
2017 """ # noqa: E501
2018
2019 seed_config = RandomSeedConfig.create_seed_config(seed)
2020
2021 op = RandomizeBlocks(
2022 seed_config=seed_config,
2023 input_dependencies=[self._logical_plan.dag],
2024 )
2025 logical_plan = LogicalPlan(op, self.context)
2026 return Dataset._from_parent(self, logical_plan)
2027
2028 @PublicAPI(api_group=BT_API_GROUP)
2029 def random_sample(

Calls 4

RandomizeBlocksClass · 0.90
LogicalPlanClass · 0.90
create_seed_configMethod · 0.80
_from_parentMethod · 0.80