Get the inner product(s) between real vectors / corpora X and Y. Return the inner product(s) between real vectors / corpora vec1 and vec2 expressed in a non-orthogonal normalized basis, where the dot product between the basis vectors is given by the sparse term similarity ma
(self, X, Y, normalized=(False, False))
| 516 | self.matrix = source.tocsc() |
| 517 | |
| 518 | def inner_product(self, X, Y, normalized=(False, False)): |
| 519 | """Get the inner product(s) between real vectors / corpora X and Y. |
| 520 | |
| 521 | Return the inner product(s) between real vectors / corpora vec1 and vec2 expressed in a |
| 522 | non-orthogonal normalized basis, where the dot product between the basis vectors is given by |
| 523 | the sparse term similarity matrix. |
| 524 | |
| 525 | Parameters |
| 526 | ---------- |
| 527 | vec1 : list of (int, float) or iterable of list of (int, float) |
| 528 | A query vector / corpus in the sparse bag-of-words format. |
| 529 | vec2 : list of (int, float) or iterable of list of (int, float) |
| 530 | A document vector / corpus in the sparse bag-of-words format. |
| 531 | normalized : tuple of {True, False, 'maintain'}, optional |
| 532 | First/second value specifies whether the query/document vectors in the inner product |
| 533 | will be L2-normalized (True; corresponds to the soft cosine measure), maintain their |
| 534 | L2-norm during change of basis ('maintain'; corresponds to query expansion with partial |
| 535 | membership), or kept as-is (False; corresponds to query expansion; default). |
| 536 | |
| 537 | Returns |
| 538 | ------- |
| 539 | `self.matrix.dtype`, `scipy.sparse.csr_matrix`, or :class:`numpy.matrix` |
| 540 | The inner product(s) between `X` and `Y`. |
| 541 | |
| 542 | References |
| 543 | ---------- |
| 544 | The soft cosine measure was perhaps first described by [sidorovetal14]_. |
| 545 | Further notes on the efficient implementation of the soft cosine measure are described by |
| 546 | [novotny18]_. |
| 547 | |
| 548 | .. [sidorovetal14] Grigori Sidorov et al., "Soft Similarity and Soft Cosine Measure: Similarity |
| 549 | of Features in Vector Space Model", 2014, http://www.cys.cic.ipn.mx/ojs/index.php/CyS/article/view/2043/1921. |
| 550 | |
| 551 | .. [novotny18] Vít Novotný, "Implementation Notes for the Soft Cosine Measure", 2018, |
| 552 | http://dx.doi.org/10.1145/3269206.3269317. |
| 553 | |
| 554 | """ |
| 555 | if not X or not Y: |
| 556 | return self.matrix.dtype.type(0.0) |
| 557 | |
| 558 | normalized_X, normalized_Y = normalized |
| 559 | valid_normalized_values = (True, False, 'maintain') |
| 560 | |
| 561 | if normalized_X not in valid_normalized_values: |
| 562 | raise ValueError('{} is not a valid value of normalize'.format(normalized_X)) |
| 563 | if normalized_Y not in valid_normalized_values: |
| 564 | raise ValueError('{} is not a valid value of normalize'.format(normalized_Y)) |
| 565 | |
| 566 | is_corpus_X, X = is_corpus(X) |
| 567 | is_corpus_Y, Y = is_corpus(Y) |
| 568 | |
| 569 | if not is_corpus_X and not is_corpus_Y: |
| 570 | X = dict(X) |
| 571 | Y = dict(Y) |
| 572 | word_indices = np.array(sorted(set(chain(X, Y)))) |
| 573 | dtype = self.matrix.dtype |
| 574 | X = np.array([X[i] if i in X else 0 for i in word_indices], dtype=dtype) |
| 575 | Y = np.array([Y[i] if i in Y else 0 for i in word_indices], dtype=dtype) |