MCPcopy
hub / github.com/ray-project/ray / split_proportionately

Method split_proportionately

python/ray/data/dataset.py:2533–2610  ·  view source on GitHub ↗

Materialize and split the dataset using proportions. A common use case for this is splitting the dataset into train and test sets (equivalent to eg. scikit-learn's ``train_test_split``). For a higher level abstraction, see :meth:`Dataset.train_test_split`. This meth

(
        self, proportions: List[float]
    )

Source from the content-addressed store, hash-verified

2531 @ConsumptionAPI
2532 @PublicAPI(api_group=SMJ_API_GROUP)
2533 def split_proportionately(
2534 self, proportions: List[float]
2535 ) -> List["MaterializedDataset"]:
2536 """Materialize and split the dataset using proportions.
2537
2538 A common use case for this is splitting the dataset into train
2539 and test sets (equivalent to eg. scikit-learn's ``train_test_split``).
2540 For a higher level abstraction, see :meth:`Dataset.train_test_split`.
2541
2542 This method splits datasets so that all splits
2543 always contains at least one row. If that isn't possible,
2544 an exception is raised.
2545
2546 This is equivalent to caulculating the indices manually and calling
2547 :meth:`Dataset.split_at_indices`.
2548
2549 Examples:
2550 >>> import ray
2551 >>> ds = ray.data.range(10)
2552 >>> d1, d2, d3 = ds.split_proportionately([0.2, 0.5])
2553 >>> d1.take_batch()
2554 {'id': array([0, 1])}
2555 >>> d2.take_batch()
2556 {'id': array([2, 3, 4, 5, 6])}
2557 >>> d3.take_batch()
2558 {'id': array([7, 8, 9])}
2559
2560 Time complexity: O(num splits)
2561
2562 Args:
2563 proportions: List of proportions to split the dataset according to.
2564 Must sum up to less than 1, and each proportion must be bigger
2565 than 0.
2566
2567 Returns:
2568 The dataset splits.
2569
2570 .. seealso::
2571
2572 :meth:`Dataset.split`
2573 Unlike :meth:`~Dataset.split_proportionately`, which lets you split a
2574 dataset into different sizes, :meth:`Dataset.split` splits a dataset
2575 into approximately equal splits.
2576
2577 :meth:`Dataset.split_at_indices`
2578 :meth:`Dataset.split_proportionately` uses this method under the hood.
2579
2580 :meth:`Dataset.streaming_split`.
2581 Unlike :meth:`~Dataset.split`, :meth:`~Dataset.streaming_split`
2582 doesn't materialize the dataset in memory.
2583 """
2584
2585 if len(proportions) < 1:
2586 raise ValueError("proportions must be at least of length 1")
2587 if sum(proportions) >= 1:
2588 raise ValueError("proportions must sum to less than 1")
2589 if any(p <= 0 for p in proportions):
2590 raise ValueError("proportions must be bigger than 0")

Callers 3

train_test_splitMethod · 0.80

Calls 4

split_at_indicesMethod · 0.80
rangeFunction · 0.70
sumFunction · 0.50

Tested by 1