MCPcopy
hub / github.com/dask/dask / compute_hll_array

Function compute_hll_array

dask/dataframe/hyperloglog.py:28–52  ·  view source on GitHub ↗
(obj, b)

Source from the content-addressed store, hash-verified

26
27
28def compute_hll_array(obj, b):
29 # b is the number of bits
30
31 if not 8 <= b <= 16:
32 raise ValueError("b should be between 8 and 16")
33 num_bits_discarded = 32 - b
34 m = 1 << b
35
36 # Get an array of the hashes
37 hashes = hash_pandas_object(obj, index=False)
38 if isinstance(hashes, pd.Series):
39 hashes = hashes._values
40 hashes = hashes.astype(np.uint32)
41
42 # Of the first b bits, which is the first nonzero?
43 j = hashes >> num_bits_discarded
44 first_bit = compute_first_bit(hashes)
45
46 # Pandas can do the max aggregation
47 df = pd.DataFrame({"j": j, "first_bit": first_bit})
48 series = df.groupby("j").max()["first_bit"]
49
50 # Return a dense array so we can concat them and get a result
51 # that is easy to deal with
52 return series.reindex(np.arange(m), fill_value=0).values.astype(np.uint8)
53
54
55def reduce_state(Ms, b):

Callers

nothing calls this directly

Calls 5

groupbyMethod · 0.95
compute_first_bitFunction · 0.85
astypeMethod · 0.45
maxMethod · 0.45
arangeMethod · 0.45

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…