MCPcopy
hub / github.com/dask/dask / shuffle

Function shuffle

dask/array/_shuffle.py:20–92  ·  view source on GitHub ↗

Reorders one dimensions of a Dask Array based on an indexer. The indexer defines a list of positional groups that will end up in the same chunk together. A single group is in at most one chunk on this dimension, but a chunk might contain multiple groups to avoid fragmentation of th

(x, indexer: list[list[int]], axis: int, chunks: Literal["auto"] = "auto")

Source from the content-addressed store, hash-verified

18
19
20def shuffle(x, indexer: list[list[int]], axis: int, chunks: Literal["auto"] = "auto"):
21 """
22 Reorders one dimensions of a Dask Array based on an indexer.
23
24 The indexer defines a list of positional groups that will end up in the same chunk
25 together. A single group is in at most one chunk on this dimension, but a chunk
26 might contain multiple groups to avoid fragmentation of the array.
27
28 The algorithm tries to balance the chunksizes as much as possible to ideally keep the
29 number of chunks consistent or at least manageable.
30
31 Parameters
32 ----------
33 x: dask array
34 Array to be shuffled.
35 indexer: list[list[int]]
36 The indexer that determines which elements along the dimension will end up in the
37 same chunk. Multiple groups can be in the same chunk to avoid fragmentation, but
38 each group will end up in exactly one chunk.
39 axis: int
40 The axis to shuffle along.
41 chunks: "auto"
42 Hint on how to rechunk if single groups are becoming too large. The default is
43 to split chunks along the other dimensions evenly to keep the chunksize
44 consistent. The rechunking is done in a way that ensures that non all-to-all
45 network communication is necessary, chunks are only split and not combined with
46 other chunks.
47
48 Examples
49 --------
50 >>> import dask.array as da
51 >>> import numpy as np
52 >>> arr = np.array([[1, 2, 3, 4, 5, 6, 7, 8], [9, 10, 11, 12, 13, 14, 15, 16]])
53 >>> x = da.from_array(arr, chunks=(2, 4))
54
55 Separate the elements in different groups.
56
57 >>> y = x.shuffle([[6, 5, 2], [4, 1], [3, 0, 7]], axis=1)
58
59 The shuffle algorithm will combine the first 2 groups into a single chunk to keep
60 the number of chunks small.
61
62 The tolerance of increasing the chunk size is controlled by the configuration
63 "array.chunk-size-tolerance". The default value is 1.25.
64
65 >>> y.chunks
66 ((2,), (5, 3))
67
68 The array was reordered along axis 1 according to the positional indexer that was given.
69
70 >>> y.compute()
71 array([[ 7, 6, 3, 5, 2, 4, 1, 8],
72 [15, 14, 11, 13, 10, 12, 9, 16]])
73 """
74 if np.isnan(x.shape).any():
75 raise ValueError(
76 f"Shuffling only allowed with known chunk sizes. {unknown_chunk_message}"
77 )

Calls 8

ArrayClass · 0.90
maxFunction · 0.85
from_collectionsMethod · 0.80
_validate_indexerFunction · 0.70
_shuffleFunction · 0.70
tokenizeFunction · 0.50
anyMethod · 0.45

Used in the wild real call sites across dependent graphs

searching dependent graphs…