hub / github.com/dask/dask / test_hash_split_unique

Function test_hash_split_unique

dask/dataframe/tests/test_dataframe.py:4001–4015 · view source on GitHub ↗

(npartitions, split_every, split_out)

Source from the content-addressed store, hash-verified

3999	@pytest.mark.parametrize("split_every", [2, 5])
4000	@pytest.mark.parametrize("split_out", [1, 5, 20])
4001	def test_hash_split_unique(npartitions, split_every, split_out):
4002	from string import ascii_lowercase
4003
4004	s = pd.Series(np.random.choice(list(ascii_lowercase), 1000, replace=True))
4005	ds = dd.from_pandas(s, npartitions=npartitions)
4006
4007	dropped = ds.unique(split_every=split_every, split_out=split_out)
4008
4009	dsk = dropped.__dask_optimize__(dropped.dask, dropped.__dask_keys__())
4010	from dask.core import get_deps
4011
4012	dependencies, dependents = get_deps(dsk)
4013
4014	assert dropped.npartitions == (split_out or 1)
4015	assert sorted(dropped.compute(scheduler="sync")) == sorted(s.unique())
4016
4017
4018	@pytest.mark.parametrize("split_every", [None, 2])

nothing calls this directly

uniqueMethod · 0.95

get_depsFunction · 0.90

__dask_optimize__Method · 0.80

choiceMethod · 0.45

__dask_keys__Method · 0.45

computeMethod · 0.45

no test coverage detected

searching dependent graphs…