MCPcopy
hub / github.com/dask/dask / _balance_chunksizes

Function _balance_chunksizes

dask/array/rechunk.py:836–870  ·  view source on GitHub ↗

Balance the chunk sizes Parameters ---------- chunks : tuple[int, ...] Chunk sizes for Dask array. Returns ------- new_chunks : tuple[int, ...] New chunks for Dask array with balanced sizes.

(chunks: tuple[int, ...])

Source from the content-addressed store, hash-verified

834
835
836def _balance_chunksizes(chunks: tuple[int, ...]) -> tuple[int, ...]:
837 """
838 Balance the chunk sizes
839
840 Parameters
841 ----------
842 chunks : tuple[int, ...]
843 Chunk sizes for Dask array.
844
845 Returns
846 -------
847 new_chunks : tuple[int, ...]
848 New chunks for Dask array with balanced sizes.
849 """
850 median_len = np.median(chunks).astype(int)
851 n_chunks = len(chunks)
852 eps = median_len // 2
853 if min(chunks) <= 0.5 * max(chunks):
854 n_chunks -= 1
855
856 new_chunks = [
857 _get_chunks(sum(chunks), chunk_len)
858 for chunk_len in range(median_len - eps, median_len + eps + 1)
859 ]
860 possible_chunks = [c for c in new_chunks if len(c) == n_chunks]
861 if not len(possible_chunks):
862 warn(
863 "chunk size balancing not possible with given chunks. "
864 "Try increasing the chunk size."
865 )
866 return chunks
867
868 diffs = [max(c) - min(c) for c in possible_chunks]
869 best_chunk_size = np.argmin(diffs)
870 return possible_chunks[best_chunk_size]

Callers 2

chunksMethod · 0.90
rechunkFunction · 0.85

Calls 8

minFunction · 0.85
maxFunction · 0.85
_get_chunksFunction · 0.85
warnFunction · 0.85
argminMethod · 0.80
sumFunction · 0.70
astypeMethod · 0.45
medianMethod · 0.45

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…