Reorders one dimensions of a Dask Array based on an indexer. The indexer defines a list of positional groups that will end up in the same chunk together. A single group is in at most one chunk on this dimension, but a chunk might contain multiple groups to avoid fragmentation of th
(x, indexer: list[list[int]], axis: int, chunks: Literal["auto"] = "auto")
| 18 | |
| 19 | |
| 20 | def shuffle(x, indexer: list[list[int]], axis: int, chunks: Literal["auto"] = "auto"): |
| 21 | """ |
| 22 | Reorders one dimensions of a Dask Array based on an indexer. |
| 23 | |
| 24 | The indexer defines a list of positional groups that will end up in the same chunk |
| 25 | together. A single group is in at most one chunk on this dimension, but a chunk |
| 26 | might contain multiple groups to avoid fragmentation of the array. |
| 27 | |
| 28 | The algorithm tries to balance the chunksizes as much as possible to ideally keep the |
| 29 | number of chunks consistent or at least manageable. |
| 30 | |
| 31 | Parameters |
| 32 | ---------- |
| 33 | x: dask array |
| 34 | Array to be shuffled. |
| 35 | indexer: list[list[int]] |
| 36 | The indexer that determines which elements along the dimension will end up in the |
| 37 | same chunk. Multiple groups can be in the same chunk to avoid fragmentation, but |
| 38 | each group will end up in exactly one chunk. |
| 39 | axis: int |
| 40 | The axis to shuffle along. |
| 41 | chunks: "auto" |
| 42 | Hint on how to rechunk if single groups are becoming too large. The default is |
| 43 | to split chunks along the other dimensions evenly to keep the chunksize |
| 44 | consistent. The rechunking is done in a way that ensures that non all-to-all |
| 45 | network communication is necessary, chunks are only split and not combined with |
| 46 | other chunks. |
| 47 | |
| 48 | Examples |
| 49 | -------- |
| 50 | >>> import dask.array as da |
| 51 | >>> import numpy as np |
| 52 | >>> arr = np.array([[1, 2, 3, 4, 5, 6, 7, 8], [9, 10, 11, 12, 13, 14, 15, 16]]) |
| 53 | >>> x = da.from_array(arr, chunks=(2, 4)) |
| 54 | |
| 55 | Separate the elements in different groups. |
| 56 | |
| 57 | >>> y = x.shuffle([[6, 5, 2], [4, 1], [3, 0, 7]], axis=1) |
| 58 | |
| 59 | The shuffle algorithm will combine the first 2 groups into a single chunk to keep |
| 60 | the number of chunks small. |
| 61 | |
| 62 | The tolerance of increasing the chunk size is controlled by the configuration |
| 63 | "array.chunk-size-tolerance". The default value is 1.25. |
| 64 | |
| 65 | >>> y.chunks |
| 66 | ((2,), (5, 3)) |
| 67 | |
| 68 | The array was reordered along axis 1 according to the positional indexer that was given. |
| 69 | |
| 70 | >>> y.compute() |
| 71 | array([[ 7, 6, 3, 5, 2, 4, 1, 8], |
| 72 | [15, 14, 11, 13, 10, 12, 9, 16]]) |
| 73 | """ |
| 74 | if np.isnan(x.shape).any(): |
| 75 | raise ValueError( |
| 76 | f"Shuffling only allowed with known chunk sizes. {unknown_chunk_message}" |
| 77 | ) |
searching dependent graphs…