Randomly shuffle the :ref:`blocks ` of this :class:`Dataset`. This method is useful if you :meth:`~Dataset.split` your dataset into shards and want to randomize the data in each shard without performing a full :meth:`~Dataset.random_shuffle`. Exampl
(
self,
*,
seed: Optional[int | RandomSeedConfig] = None,
)
| 1976 | @AllToAllAPI |
| 1977 | @PublicAPI(api_group=SSR_API_GROUP) |
| 1978 | def randomize_block_order( |
| 1979 | self, |
| 1980 | *, |
| 1981 | seed: Optional[int | RandomSeedConfig] = None, |
| 1982 | ) -> "Dataset": |
| 1983 | """Randomly shuffle the :ref:`blocks <dataset_concept>` of this :class:`Dataset`. |
| 1984 | |
| 1985 | This method is useful if you :meth:`~Dataset.split` your dataset into shards and |
| 1986 | want to randomize the data in each shard without performing a full |
| 1987 | :meth:`~Dataset.random_shuffle`. |
| 1988 | |
| 1989 | Examples: |
| 1990 | >>> import ray |
| 1991 | >>> ds = ray.data.range(100) |
| 1992 | >>> ds.take(5) |
| 1993 | [{'id': 0}, {'id': 1}, {'id': 2}, {'id': 3}, {'id': 4}] |
| 1994 | >>> ds.randomize_block_order().take(5) # doctest: +SKIP |
| 1995 | {'id': 15}, {'id': 16}, {'id': 17}, {'id': 18}, {'id': 19}] |
| 1996 | >>> ds.randomize_block_order(seed=RandomSeedConfig(seed=42, reseed_after_execution=False)).take(5) # doctest: +SKIP |
| 1997 | [{'id': 44}, {'id': 45}, {'id': 46}, {'id': 47}, {'id': 80}] |
| 1998 | >>> ds.randomize_block_order(seed=RandomSeedConfig(seed=42, reseed_after_execution=False)).take(5) # doctest: +SKIP |
| 1999 | [{'id': 44}, {'id': 45}, {'id': 46}, {'id': 47}, {'id': 80}] |
| 2000 | |
| 2001 | Reproducible but non-deterministic across executions (e.g., training epochs): |
| 2002 | >>> ds = ray.data.range(100) |
| 2003 | >>> ds.randomize_block_order(seed=RandomSeedConfig(seed=42, reseed_after_execution=True)).take(5) # doctest: +SKIP |
| 2004 | [{'id': 40}, {'id': 41}, {'id': 42}, {'id': 43}, {'id': 28}] |
| 2005 | >>> ds.randomize_block_order(seed=RandomSeedConfig(seed=42, reseed_after_execution=True)).take(5) # doctest: +SKIP |
| 2006 | [{'id': 92}, {'id': 93}, {'id': 94}, {'id': 95}, {'id': 88}] |
| 2007 | |
| 2008 | Args: |
| 2009 | seed: An optional random seed. Can be an integer or a :class:`RandomSeedConfig` |
| 2010 | object. If an integer is provided, it defaults to fully deterministic |
| 2011 | behavior (same block order across executions). If None, the block |
| 2012 | order is non-deterministic. See :class:`RandomSeedConfig` for more details on |
| 2013 | seed behavior. |
| 2014 | |
| 2015 | Returns: |
| 2016 | The block-shuffled :class:`Dataset`. |
| 2017 | """ # noqa: E501 |
| 2018 | |
| 2019 | seed_config = RandomSeedConfig.create_seed_config(seed) |
| 2020 | |
| 2021 | op = RandomizeBlocks( |
| 2022 | seed_config=seed_config, |
| 2023 | input_dependencies=[self._logical_plan.dag], |
| 2024 | ) |
| 2025 | logical_plan = LogicalPlan(op, self.context) |
| 2026 | return Dataset._from_parent(self, logical_plan) |
| 2027 | |
| 2028 | @PublicAPI(api_group=BT_API_GROUP) |
| 2029 | def random_sample( |