Sort the dataset by the specified key column or key function. The `key` parameter must be specified (i.e., it cannot be `None`). .. note:: If provided, the `boundaries` parameter can only be used to partition the first sort key. Examples:
(
self,
key: Union[str, List[str]],
descending: Union[bool, List[bool]] = False,
boundaries: List[Union[int, float]] = None,
)
| 3745 | @AllToAllAPI |
| 3746 | @PublicAPI(api_group=SSR_API_GROUP) |
| 3747 | def sort( |
| 3748 | self, |
| 3749 | key: Union[str, List[str]], |
| 3750 | descending: Union[bool, List[bool]] = False, |
| 3751 | boundaries: List[Union[int, float]] = None, |
| 3752 | ) -> "Dataset": |
| 3753 | """Sort the dataset by the specified key column or key function. |
| 3754 | The `key` parameter must be specified (i.e., it cannot be `None`). |
| 3755 | |
| 3756 | .. note:: |
| 3757 | If provided, the `boundaries` parameter can only be used to partition |
| 3758 | the first sort key. |
| 3759 | |
| 3760 | Examples: |
| 3761 | >>> import ray |
| 3762 | >>> ds = ray.data.range(15) |
| 3763 | >>> ds = ds.sort("id", descending=False, boundaries=[5, 10]) |
| 3764 | >>> for df in ray.get(ds.to_pandas_refs()): |
| 3765 | ... print(df) |
| 3766 | id |
| 3767 | 0 0 |
| 3768 | 1 1 |
| 3769 | 2 2 |
| 3770 | 3 3 |
| 3771 | 4 4 |
| 3772 | id |
| 3773 | 0 5 |
| 3774 | 1 6 |
| 3775 | 2 7 |
| 3776 | 3 8 |
| 3777 | 4 9 |
| 3778 | id |
| 3779 | 0 10 |
| 3780 | 1 11 |
| 3781 | 2 12 |
| 3782 | 3 13 |
| 3783 | 4 14 |
| 3784 | |
| 3785 | Time complexity: O(dataset size * log(dataset size / parallelism)) |
| 3786 | |
| 3787 | Args: |
| 3788 | key: The column or a list of columns to sort by. |
| 3789 | descending: Whether to sort in descending order. Must be a boolean or a list |
| 3790 | of booleans matching the number of the columns. |
| 3791 | boundaries: The list of values based on which to repartition the dataset. |
| 3792 | For example, if the input boundary is [10,20], rows with values less |
| 3793 | than 10 will be divided into the first block, rows with values greater |
| 3794 | than or equal to 10 and less than 20 will be divided into the |
| 3795 | second block, and rows with values greater than or equal to 20 |
| 3796 | will be divided into the third block. If not provided, the |
| 3797 | boundaries will be sampled from the input blocks. This feature |
| 3798 | only supports numeric columns right now. |
| 3799 | |
| 3800 | Returns: |
| 3801 | A new, sorted :class:`Dataset`. |
| 3802 | |
| 3803 | Raises: |
| 3804 | ``ValueError``: if the sort key is None. |