:param dataset: Set containing the vectors. Should be ndarray. :param value_array: vector/vectors we want to know the nearest vector from dataset. :return: Result will be a list containing 1. the nearest vector 2. distance from the vector >>> dataset = np.ar
(
dataset: np.ndarray, value_array: np.ndarray
)
| 35 | |
| 36 | |
| 37 | def similarity_search( |
| 38 | dataset: np.ndarray, value_array: np.ndarray |
| 39 | ) -> list[list[list[float] | float]]: |
| 40 | """ |
| 41 | :param dataset: Set containing the vectors. Should be ndarray. |
| 42 | :param value_array: vector/vectors we want to know the nearest vector from dataset. |
| 43 | :return: Result will be a list containing |
| 44 | 1. the nearest vector |
| 45 | 2. distance from the vector |
| 46 | |
| 47 | >>> dataset = np.array([[0], [1], [2]]) |
| 48 | >>> value_array = np.array([[0]]) |
| 49 | >>> similarity_search(dataset, value_array) |
| 50 | [[[0], 0.0]] |
| 51 | |
| 52 | >>> dataset = np.array([[0, 0], [1, 1], [2, 2]]) |
| 53 | >>> value_array = np.array([[0, 1]]) |
| 54 | >>> similarity_search(dataset, value_array) |
| 55 | [[[0, 0], 1.0]] |
| 56 | |
| 57 | >>> dataset = np.array([[0, 0, 0], [1, 1, 1], [2, 2, 2]]) |
| 58 | >>> value_array = np.array([[0, 0, 1]]) |
| 59 | >>> similarity_search(dataset, value_array) |
| 60 | [[[0, 0, 0], 1.0]] |
| 61 | |
| 62 | >>> dataset = np.array([[0, 0, 0], [1, 1, 1], [2, 2, 2]]) |
| 63 | >>> value_array = np.array([[0, 0, 0], [0, 0, 1]]) |
| 64 | >>> similarity_search(dataset, value_array) |
| 65 | [[[0, 0, 0], 0.0], [[0, 0, 0], 1.0]] |
| 66 | |
| 67 | These are the errors that might occur: |
| 68 | |
| 69 | 1. If dimensions are different. |
| 70 | For example, dataset has 2d array and value_array has 1d array: |
| 71 | >>> dataset = np.array([[1]]) |
| 72 | >>> value_array = np.array([1]) |
| 73 | >>> similarity_search(dataset, value_array) |
| 74 | Traceback (most recent call last): |
| 75 | ... |
| 76 | ValueError: Wrong input data's dimensions... dataset : 2, value_array : 1 |
| 77 | |
| 78 | 2. If data's shapes are different. |
| 79 | For example, dataset has shape of (3, 2) and value_array has (2, 3). |
| 80 | We are expecting same shapes of two arrays, so it is wrong. |
| 81 | >>> dataset = np.array([[0, 0], [1, 1], [2, 2]]) |
| 82 | >>> value_array = np.array([[0, 0, 0], [0, 0, 1]]) |
| 83 | >>> similarity_search(dataset, value_array) |
| 84 | Traceback (most recent call last): |
| 85 | ... |
| 86 | ValueError: Wrong input data's shape... dataset : 2, value_array : 3 |
| 87 | |
| 88 | 3. If data types are different. |
| 89 | When trying to compare, we are expecting same types so they should be same. |
| 90 | If not, it'll come up with errors. |
| 91 | >>> dataset = np.array([[0, 0], [1, 1], [2, 2]], dtype=np.float32) |
| 92 | >>> value_array = np.array([[0, 0], [0, 1]], dtype=np.int32) |
| 93 | >>> similarity_search(dataset, value_array) # doctest: +NORMALIZE_WHITESPACE |
| 94 | Traceback (most recent call last): |