MCPcopy
hub / github.com/dask/dask / _unique_internal

Function _unique_internal

dask/array/routines.py:1649–1707  ·  view source on GitHub ↗

Helper/wrapper function for :func:`numpy.unique`. Uses :func:`numpy.unique` to find the unique values for the array chunk. Given this chunk may not represent the whole array, also take the ``indices`` and ``counts`` that are in 1-to-1 correspondence to ``ar`` and reduce them in

(ar, indices, counts, return_inverse=False)

Source from the content-addressed store, hash-verified

1647
1648
1649def _unique_internal(ar, indices, counts, return_inverse=False):
1650 """
1651 Helper/wrapper function for :func:`numpy.unique`.
1652
1653 Uses :func:`numpy.unique` to find the unique values for the array chunk.
1654 Given this chunk may not represent the whole array, also take the
1655 ``indices`` and ``counts`` that are in 1-to-1 correspondence to ``ar``
1656 and reduce them in the same fashion as ``ar`` is reduced. Namely sum
1657 any counts that correspond to the same value and take the smallest
1658 index that corresponds to the same value.
1659
1660 To handle the inverse mapping from the unique values to the original
1661 array, simply return a NumPy array created with ``arange`` with enough
1662 values to correspond 1-to-1 to the unique values. While there is more
1663 work needed to be done to create the full inverse mapping for the
1664 original array, this provides enough information to generate the
1665 inverse mapping in Dask.
1666
1667 Given Dask likes to have one array returned from functions like
1668 ``blockwise``, some formatting is done to stuff all of the resulting arrays
1669 into one big NumPy structured array. Dask is then able to handle this
1670 object and can split it apart into the separate results on the Dask side,
1671 which then can be passed back to this function in concatenated chunks for
1672 further reduction or can be return to the user to perform other forms of
1673 analysis.
1674
1675 By handling the problem in this way, it does not matter where a chunk
1676 is in a larger array or how big it is. The chunk can still be computed
1677 on the same way. Also it does not matter if the chunk is the result of
1678 other chunks being run through this function multiple times. The end
1679 result will still be just as accurate using this strategy.
1680 """
1681
1682 return_index = indices is not None
1683 return_counts = counts is not None
1684
1685 u = np.unique(ar)
1686
1687 dt = [("values", u.dtype)]
1688 if return_index:
1689 dt.append(("indices", np.intp))
1690 if return_inverse:
1691 dt.append(("inverse", np.intp))
1692 if return_counts:
1693 dt.append(("counts", np.intp))
1694
1695 r = np.empty(u.shape, dtype=dt)
1696 r["values"] = u
1697 if return_inverse:
1698 r["inverse"] = np.arange(len(r), dtype=np.intp)
1699 if return_index or return_counts:
1700 for i, v in enumerate(r["values"]):
1701 m = ar == v
1702 if return_index:
1703 indices[m].min(keepdims=True, out=r["indices"][i : i + 1])
1704 if return_counts:
1705 counts[m].sum(keepdims=True, out=r["counts"][i : i + 1])
1706

Callers

nothing calls this directly

Calls 5

uniqueMethod · 0.45
emptyMethod · 0.45
arangeMethod · 0.45
minMethod · 0.45
sumMethod · 0.45

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…