hub / github.com/huggingface/datasets / sort

Method sort

src/datasets/arrow_dataset.py:4706–4830 · view source on GitHub ↗

Create a new dataset sorted according to a single or multiple columns. Args: column_names (`Union[str, Sequence[str]]`): Column name(s) to sort by. reverse (`Union[bool, Sequence[bool]]`, defaults to `False`): If `True`, sort by descen

(
        self,
        column_names: Union[str, Sequence_[str]],
        reverse: Union[bool, Sequence_[bool]] = False,
        null_placement: str = "at_end",
        keep_in_memory: bool = False,
        load_from_cache_file: Optional[bool] = None,
        indices_cache_file_name: Optional[str] = None,
        writer_batch_size: Optional[int] = 1000,
        new_fingerprint: Optional[str] = None,
    )

Source from the content-addressed store, hash-verified

4704	@transmit_format
4705	@fingerprint_transform(inplace=False, ignore_kwargs=["load_from_cache_file", "indices_cache_file_name"])
4706	def sort(
4707	self,
4708	column_names: Union[str, Sequence_[str]],
4709	reverse: Union[bool, Sequence_[bool]] = False,
4710	null_placement: str = "at_end",
4711	keep_in_memory: bool = False,
4712	load_from_cache_file: Optional[bool] = None,
4713	indices_cache_file_name: Optional[str] = None,
4714	writer_batch_size: Optional[int] = 1000,
4715	new_fingerprint: Optional[str] = None,
4716	) -> "Dataset":
4717	"""Create a new dataset sorted according to a single or multiple columns.
4718
4719	Args:
4720	column_names (`Union[str, Sequence[str]]`):
4721	Column name(s) to sort by.
4722	reverse (`Union[bool, Sequence[bool]]`, defaults to `False`):
4723	If `True`, sort by descending order rather than ascending. If a single bool is provided,
4724	the value is applied to the sorting of all column names. Otherwise a list of bools with the
4725	same length and order as column_names must be provided.
4726	null_placement (`str`, defaults to `at_end`):
4727	Put `None` values at the beginning if `at_start` or `first` or at the end if `at_end` or `last`
4728
4729	<Added version="1.14.2"/>
4730	keep_in_memory (`bool`, defaults to `False`):
4731	Keep the sorted indices in memory instead of writing it to a cache file.
4732	load_from_cache_file (`Optional[bool]`, defaults to `True` if caching is enabled):
4733	If a cache file storing the sorted indices
4734	can be identified, use it instead of recomputing.
4735	indices_cache_file_name (`str`, optional, defaults to `None`):
4736	Provide the name of a path for the cache file. It is used to store the
4737	sorted indices instead of the automatically generated cache file name.
4738	writer_batch_size (`int`, defaults to `1000`):
4739	Number of rows per write operation for the cache file writer.
4740	Higher value gives smaller cache files, lower value consume less temporary memory.
4741	new_fingerprint (`str`, optional, defaults to `None`):
4742	The new fingerprint of the dataset after transform.
4743	If `None`, the new fingerprint is computed using a hash of the previous fingerprint, and the transform arguments
4744
4745	Example:
4746
4747	```py
4748	>>> from datasets import load_dataset
4749	>>> ds = load_dataset('cornell-movie-review-data/rotten_tomatoes', split='validation')
4750	>>> ds['label'][:10]
4751	[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
4752	>>> sorted_ds = ds.sort('label')
4753	>>> sorted_ds['label'][:10]
4754	[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
4755	>>> another_sorted_ds = ds.sort(['label', 'text'], reverse=[True, False])
4756	>>> another_sorted_ds['label'][:10]
4757	[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
4758	```
4759	"""
4760	if len(self.list_indexes()) > 0:
4761	raise DatasetTransformationNotAllowedError(
4762	"Using `.sort` on a dataset with attached indexes is not allowed. You can first run `.drop_index() to remove your index and then re-add it."
4763	)

Callers 9

_other_versions_on_diskMethod · 0.45

approximate_modeFunction · 0.45

_iter_samples_from_log_filesMethod · 0.45

test_sortMethod · 0.45

test_sort_with_noneFunction · 0.45

test_sortMethod · 0.45

lsMethod · 0.45

sortFunction · 0.45

Calls 9

_get_cache_file_pathMethod · 0.95

_new_dataset_with_indicesMethod · 0.95

selectMethod · 0.95

DatasetTransformationNotAllowedErrorClass · 0.85

is_caching_enabledFunction · 0.85

query_tableFunction · 0.85

list_indexesMethod · 0.80

existsMethod · 0.80

infoMethod · 0.45

Tested by 5

test_sortMethod · 0.36

test_sort_with_noneFunction · 0.36

test_sortMethod · 0.36

lsMethod · 0.36