Function shuffle

dask/array/_shuffle.py:20–92 · view source on GitHub ↗

Reorders one dimensions of a Dask Array based on an indexer. The indexer defines a list of positional groups that will end up in the same chunk together. A single group is in at most one chunk on this dimension, but a chunk might contain multiple groups to avoid fragmentation of th

(x, indexer: list[list[int]], axis: int, chunks: Literal["auto"] = "auto")

Source from the content-addressed store, hash-verified

18
19
20	def shuffle(x, indexer: list[list[int]], axis: int, chunks: Literal["auto"] = "auto"):
21	"""
22	Reorders one dimensions of a Dask Array based on an indexer.
23
24	The indexer defines a list of positional groups that will end up in the same chunk
25	together. A single group is in at most one chunk on this dimension, but a chunk
26	might contain multiple groups to avoid fragmentation of the array.
27
28	The algorithm tries to balance the chunksizes as much as possible to ideally keep the
29	number of chunks consistent or at least manageable.
30
31	Parameters
32	----------
33	x: dask array
34	Array to be shuffled.
35	indexer: list[list[int]]
36	The indexer that determines which elements along the dimension will end up in the
37	same chunk. Multiple groups can be in the same chunk to avoid fragmentation, but
38	each group will end up in exactly one chunk.
39	axis: int
40	The axis to shuffle along.
41	chunks: "auto"
42	Hint on how to rechunk if single groups are becoming too large. The default is
43	to split chunks along the other dimensions evenly to keep the chunksize
44	consistent. The rechunking is done in a way that ensures that non all-to-all
45	network communication is necessary, chunks are only split and not combined with
46	other chunks.
47
48	Examples
49	--------
50	>>> import dask.array as da
51	>>> import numpy as np
52	>>> arr = np.array([[1, 2, 3, 4, 5, 6, 7, 8], [9, 10, 11, 12, 13, 14, 15, 16]])
53	>>> x = da.from_array(arr, chunks=(2, 4))
54
55	Separate the elements in different groups.
56
57	>>> y = x.shuffle([[6, 5, 2], [4, 1], [3, 0, 7]], axis=1)
58
59	The shuffle algorithm will combine the first 2 groups into a single chunk to keep
60	the number of chunks small.
61
62	The tolerance of increasing the chunk size is controlled by the configuration
63	"array.chunk-size-tolerance". The default value is 1.25.
64
65	>>> y.chunks
66	((2,), (5, 3))
67
68	The array was reordered along axis 1 according to the positional indexer that was given.
69
70	>>> y.compute()
71	array([[ 7, 6, 3, 5, 2, 4, 1, 8],
72	[15, 14, 11, 13, 10, 12, 9, 16]])
73	"""
74	if np.isnan(x.shape).any():
75	raise ValueError(
76	f"Shuffling only allowed with known chunk sizes. {unknown_chunk_message}"
77	)

Callers 9

shuffleMethod · 0.90

test_shuffle_larger_arrayFunction · 0.90

test_default_partitionsFunction · 0.85

test_shuffle_npartitionsFunction · 0.85

test_shuffle_npartitions_lt_input_partitionsFunction · 0.85

test_index_with_non_seriesFunction · 0.85

test_index_with_dataframeFunction · 0.85

test_shuffle_from_one_partition_to_one_otherFunction · 0.85

test_shuffle_empty_partitionsFunction · 0.85

Calls 8

ArrayClass · 0.90

_rechunk_other_dimensionsFunction · 0.85

maxFunction · 0.85

from_collectionsMethod · 0.80

_validate_indexerFunction · 0.70

_shuffleFunction · 0.70

tokenizeFunction · 0.50

anyMethod · 0.45

Tested by 8

test_shuffle_larger_arrayFunction · 0.72

test_default_partitionsFunction · 0.68

test_shuffle_npartitionsFunction · 0.68

test_shuffle_npartitions_lt_input_partitionsFunction · 0.68

test_index_with_non_seriesFunction · 0.68

test_index_with_dataframeFunction · 0.68

test_shuffle_from_one_partition_to_one_otherFunction · 0.68

test_shuffle_empty_partitionsFunction · 0.68

Used in the wild real call sites across dependent graphs

searching dependent graphs…