MCPcopy
hub / github.com/pathwaycom/pathway / test_lsh

Function test_lsh

python/pathway/stdlib/ml/classifiers/test_lsh.py:40–56  ·  view source on GitHub ↗

Verifies that close points are mapped together and distant ones - apart.

()

Source from the content-addressed store, hash-verified

38
39
40def test_lsh():
41 """Verifies that close points are mapped together and distant ones - apart."""
42 L = 3 # number of ORs
43 data_df = pd.DataFrame({"data": [[1, 2, 3], [1.02, 2.01, 3.03], [4, 5, 6]]})
44 data = T(data_df, format="pandas", unsafe_trusted_ids=True)
45
46 bucketer = generate_euclidean_lsh_bucketer(d=3, M=5, L=L, A=3)
47 flat_data = lsh(data, bucketer, origin_id="data_id")
48 result = flat_data.groupby(flat_data.bucketing, flat_data.band).reduce(
49 data_ids=reducers.sorted_tuple(apply(int, flat_data.data_id))
50 )
51 # TODO change app apply_with_type(int, int, ...) to cast(int, ...) once
52 # we have cast from Pointer to int
53 res_pd = table_to_pandas(result)
54 assert np.array_equal(
55 np.unique(res_pd["data_ids"]), np.array([(0, 1), (2,)], dtype=object)
56 ) # point 0 and 1 are close together, point 2 is further away
57
58
59def test_lsh_bucketing():

Callers

nothing calls this directly

Calls 7

TFunction · 0.90
applyFunction · 0.90
table_to_pandasFunction · 0.90
lshFunction · 0.85
reduceMethod · 0.45
groupbyMethod · 0.45

Tested by

no test coverage detected