MCPcopy
hub / github.com/dask/dask / unique

Function unique

dask/array/routines.py:1769–1880  ·  view source on GitHub ↗
(ar, return_index=False, return_inverse=False, return_counts=False)

Source from the content-addressed store, hash-verified

1767
1768@derived_from(np)
1769def unique(ar, return_index=False, return_inverse=False, return_counts=False):
1770 # Test whether the downstream library supports structured arrays. If the
1771 # `np.empty_like` call raises a `TypeError`, the downstream library (e.g.,
1772 # CuPy) doesn't support it. In that case we return the
1773 # `unique_no_structured_arr` implementation, otherwise (e.g., NumPy) just
1774 # continue as normal.
1775 try:
1776 meta = meta_from_array(ar)
1777 np.empty_like(meta, dtype=[("a", int), ("b", float)])
1778 except TypeError:
1779 return unique_no_structured_arr(
1780 ar,
1781 return_index=return_index,
1782 return_inverse=return_inverse,
1783 return_counts=return_counts,
1784 )
1785
1786 orig_shape = ar.shape
1787 ar = ar.ravel()
1788
1789 # Run unique on each chunk and collect results in a Dask Array of
1790 # unknown size.
1791
1792 args = [ar, "i"]
1793 out_dtype = [("values", ar.dtype)]
1794 if return_index:
1795 args.extend([arange(ar.shape[0], dtype=np.intp, chunks=ar.chunks[0]), "i"])
1796 out_dtype.append(("indices", np.intp))
1797 else:
1798 args.extend([None, None])
1799 if return_counts:
1800 args.extend([ones((ar.shape[0],), dtype=np.intp, chunks=ar.chunks[0]), "i"])
1801 out_dtype.append(("counts", np.intp))
1802 else:
1803 args.extend([None, None])
1804
1805 out = blockwise(_unique_internal, "i", *args, dtype=out_dtype, return_inverse=False)
1806 out._chunks = tuple((np.nan,) * len(c) for c in out.chunks)
1807
1808 # Take the results from the unique chunks and do the following.
1809 #
1810 # 1. Collect all results as arguments.
1811 # 2. Concatenate each result into one big array.
1812 # 3. Pass all results as arguments to the internal unique again.
1813 #
1814 # TODO: This should be replaced with a tree reduction using this strategy.
1815 # xref: https://github.com/dask/dask/issues/2851
1816
1817 out_parts = [out["values"]]
1818 if return_index:
1819 out_parts.append(out["indices"])
1820 else:
1821 out_parts.append(None)
1822 if return_counts:
1823 out_parts.append(out["counts"])
1824 else:
1825 out_parts.append(None)
1826

Callers 5

apply_gufuncFunction · 0.70
union1dFunction · 0.70
chunk_distinctFunction · 0.50
test_distinct_with_keyFunction · 0.50
apply_gufuncFunction · 0.50

Calls 11

meta_from_arrayFunction · 0.90
arangeFunction · 0.90
ArrayClass · 0.90
unique_no_structured_arrFunction · 0.85
from_collectionsMethod · 0.80
reshapeMethod · 0.80
blockwiseFunction · 0.70
ravelMethod · 0.45
__dask_keys__Method · 0.45
astypeMethod · 0.45
sumMethod · 0.45

Tested by 1

test_distinct_with_keyFunction · 0.40

Used in the wild real call sites across dependent graphs

searching dependent graphs…