MCPcopy
hub / github.com/MaartenGr/BERTopic / Model2VecBackend

Class Model2VecBackend

bertopic/backend/_model2vec.py:9–129  ·  view source on GitHub ↗

Model2Vec embedding model. Arguments: embedding_model: Either a model2vec model or a string pointing to a model2vec model distill: Indicates whether to distill a sentence-transformers compatible model. The distillation will happen during

Source from the content-addressed store, hash-verified

7
8
9class Model2VecBackend(BaseEmbedder):
10 """Model2Vec embedding model.
11
12 Arguments:
13 embedding_model: Either a model2vec model or a
14 string pointing to a model2vec model
15 distill: Indicates whether to distill a sentence-transformers compatible model.
16 The distillation will happen during fitting of the topic model.
17 NOTE: Only works if `embedding_model` is a string.
18 distill_kwargs: Keyword arguments to pass to the distillation process
19 of `model2vec.distill.distill`
20 distill_vectorizer: A CountVectorizer used for creating a custom vocabulary
21 based on the same documents used for topic modeling.
22 NOTE: If "vocabulary" is in `distill_kwargs`, this will be ignored.
23
24 Examples:
25 To create a model, you can load in a string pointing to a
26 model2vec model:
27
28 ```python
29 from bertopic.backend import Model2VecBackend
30
31 sentence_model = Model2VecBackend("minishlab/potion-base-8M")
32 ```
33
34 or you can instantiate a model yourself:
35
36 ```python
37 from bertopic.backend import Model2VecBackend
38 from model2vec import StaticModel
39
40 embedding_model = StaticModel.from_pretrained("minishlab/potion-base-8M")
41 sentence_model = Model2VecBackend(embedding_model)
42 ```
43
44 If you want to distill a sentence-transformers model with the vocabulary of the documents,
45 run the following:
46
47 ```python
48 from bertopic.backend import Model2VecBackend
49
50 sentence_model = Model2VecBackend("sentence-transformers/all-MiniLM-L6-v2", distill=True)
51 ```
52 """
53
54 def __init__(
55 self,
56 embedding_model: Union[str, StaticModel],
57 distill: bool = False,
58 distill_kwargs: dict = {},
59 distill_vectorizer: str | None = None,
60 ):
61 super().__init__()
62
63 self.distill = distill
64 self.distill_kwargs = distill_kwargs
65 self.distill_vectorizer = distill_vectorizer
66 self._has_distilled = False

Callers 1

select_backendFunction · 0.85

Calls

no outgoing calls

Tested by

no test coverage detected