MCPcopy
hub / github.com/ray-project/ray / sort

Method sort

python/ray/data/dataset.py:3747–3814  ·  view source on GitHub ↗

Sort the dataset by the specified key column or key function. The `key` parameter must be specified (i.e., it cannot be `None`). .. note:: If provided, the `boundaries` parameter can only be used to partition the first sort key. Examples:

(
        self,
        key: Union[str, List[str]],
        descending: Union[bool, List[bool]] = False,
        boundaries: List[Union[int, float]] = None,
    )

Source from the content-addressed store, hash-verified

3745 @AllToAllAPI
3746 @PublicAPI(api_group=SSR_API_GROUP)
3747 def sort(
3748 self,
3749 key: Union[str, List[str]],
3750 descending: Union[bool, List[bool]] = False,
3751 boundaries: List[Union[int, float]] = None,
3752 ) -> "Dataset":
3753 """Sort the dataset by the specified key column or key function.
3754 The `key` parameter must be specified (i.e., it cannot be `None`).
3755
3756 .. note::
3757 If provided, the `boundaries` parameter can only be used to partition
3758 the first sort key.
3759
3760 Examples:
3761 >>> import ray
3762 >>> ds = ray.data.range(15)
3763 >>> ds = ds.sort("id", descending=False, boundaries=[5, 10])
3764 >>> for df in ray.get(ds.to_pandas_refs()):
3765 ... print(df)
3766 id
3767 0 0
3768 1 1
3769 2 2
3770 3 3
3771 4 4
3772 id
3773 0 5
3774 1 6
3775 2 7
3776 3 8
3777 4 9
3778 id
3779 0 10
3780 1 11
3781 2 12
3782 3 13
3783 4 14
3784
3785 Time complexity: O(dataset size * log(dataset size / parallelism))
3786
3787 Args:
3788 key: The column or a list of columns to sort by.
3789 descending: Whether to sort in descending order. Must be a boolean or a list
3790 of booleans matching the number of the columns.
3791 boundaries: The list of values based on which to repartition the dataset.
3792 For example, if the input boundary is [10,20], rows with values less
3793 than 10 will be divided into the first block, rows with values greater
3794 than or equal to 10 and less than 20 will be divided into the
3795 second block, and rows with values greater than or equal to 20
3796 will be divided into the third block. If not provided, the
3797 boundaries will be sampled from the input blocks. This feature
3798 only supports numeric columns right now.
3799
3800 Returns:
3801 A new, sorted :class:`Dataset`.
3802
3803 Raises:
3804 ``ValueError``: if the sort key is None.

Calls 4

SortKeyClass · 0.90
SortClass · 0.90
LogicalPlanClass · 0.90
_from_parentMethod · 0.80