Model2Vec embedding model. Arguments: embedding_model: Either a model2vec model or a string pointing to a model2vec model distill: Indicates whether to distill a sentence-transformers compatible model. The distillation will happen during
| 7 | |
| 8 | |
| 9 | class Model2VecBackend(BaseEmbedder): |
| 10 | """Model2Vec embedding model. |
| 11 | |
| 12 | Arguments: |
| 13 | embedding_model: Either a model2vec model or a |
| 14 | string pointing to a model2vec model |
| 15 | distill: Indicates whether to distill a sentence-transformers compatible model. |
| 16 | The distillation will happen during fitting of the topic model. |
| 17 | NOTE: Only works if `embedding_model` is a string. |
| 18 | distill_kwargs: Keyword arguments to pass to the distillation process |
| 19 | of `model2vec.distill.distill` |
| 20 | distill_vectorizer: A CountVectorizer used for creating a custom vocabulary |
| 21 | based on the same documents used for topic modeling. |
| 22 | NOTE: If "vocabulary" is in `distill_kwargs`, this will be ignored. |
| 23 | |
| 24 | Examples: |
| 25 | To create a model, you can load in a string pointing to a |
| 26 | model2vec model: |
| 27 | |
| 28 | ```python |
| 29 | from bertopic.backend import Model2VecBackend |
| 30 | |
| 31 | sentence_model = Model2VecBackend("minishlab/potion-base-8M") |
| 32 | ``` |
| 33 | |
| 34 | or you can instantiate a model yourself: |
| 35 | |
| 36 | ```python |
| 37 | from bertopic.backend import Model2VecBackend |
| 38 | from model2vec import StaticModel |
| 39 | |
| 40 | embedding_model = StaticModel.from_pretrained("minishlab/potion-base-8M") |
| 41 | sentence_model = Model2VecBackend(embedding_model) |
| 42 | ``` |
| 43 | |
| 44 | If you want to distill a sentence-transformers model with the vocabulary of the documents, |
| 45 | run the following: |
| 46 | |
| 47 | ```python |
| 48 | from bertopic.backend import Model2VecBackend |
| 49 | |
| 50 | sentence_model = Model2VecBackend("sentence-transformers/all-MiniLM-L6-v2", distill=True) |
| 51 | ``` |
| 52 | """ |
| 53 | |
| 54 | def __init__( |
| 55 | self, |
| 56 | embedding_model: Union[str, StaticModel], |
| 57 | distill: bool = False, |
| 58 | distill_kwargs: dict = {}, |
| 59 | distill_vectorizer: str | None = None, |
| 60 | ): |
| 61 | super().__init__() |
| 62 | |
| 63 | self.distill = distill |
| 64 | self.distill_kwargs = distill_kwargs |
| 65 | self.distill_vectorizer = distill_vectorizer |
| 66 | self._has_distilled = False |