MCPcopy
hub / github.com/dask/dask / estimate_count

Function estimate_count

dask/dataframe/hyperloglog.py:64–83  ·  view source on GitHub ↗
(Ms, b)

Source from the content-addressed store, hash-verified

62
63
64def estimate_count(Ms, b):
65 m = 1 << b
66
67 # Combine one last time
68 M = reduce_state(Ms, b)
69
70 # Estimate cardinality, no adjustments
71 alpha = 0.7213 / (1 + 1.079 / m)
72 E = alpha * m / (2.0 ** -(M.astype("f8"))).sum() * m
73 # ^^^^ starts as unsigned, need a signed type for
74 # negation operator to do something useful
75
76 # Apply adjustments for small / big cardinalities, if applicable
77 if E < 2.5 * m:
78 V = (M == 0).sum()
79 if V:
80 return m * np.log(m / V)
81 if E > 2**32 / 30.0:
82 return -(2**32) * np.log1p(-E / 2**32)
83 return E

Callers

nothing calls this directly

Calls 3

reduce_stateFunction · 0.85
sumMethod · 0.45
astypeMethod · 0.45

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…