MCPcopy Index your code
hub / github.com/dask/dask / test_hash_split_unique

Function test_hash_split_unique

dask/dataframe/tests/test_dataframe.py:4001–4015  ·  view source on GitHub ↗
(npartitions, split_every, split_out)

Source from the content-addressed store, hash-verified

3999@pytest.mark.parametrize("split_every", [2, 5])
4000@pytest.mark.parametrize("split_out", [1, 5, 20])
4001def test_hash_split_unique(npartitions, split_every, split_out):
4002 from string import ascii_lowercase
4003
4004 s = pd.Series(np.random.choice(list(ascii_lowercase), 1000, replace=True))
4005 ds = dd.from_pandas(s, npartitions=npartitions)
4006
4007 dropped = ds.unique(split_every=split_every, split_out=split_out)
4008
4009 dsk = dropped.__dask_optimize__(dropped.dask, dropped.__dask_keys__())
4010 from dask.core import get_deps
4011
4012 dependencies, dependents = get_deps(dsk)
4013
4014 assert dropped.npartitions == (split_out or 1)
4015 assert sorted(dropped.compute(scheduler="sync")) == sorted(s.unique())
4016
4017
4018@pytest.mark.parametrize("split_every", [None, 2])

Callers

nothing calls this directly

Calls 6

uniqueMethod · 0.95
get_depsFunction · 0.90
__dask_optimize__Method · 0.80
choiceMethod · 0.45
__dask_keys__Method · 0.45
computeMethod · 0.45

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…