MCPcopy
hub / github.com/MaartenGr/BERTopic / merge_models

Method merge_models

bertopic/_bertopic.py:3590–3749  ·  view source on GitHub ↗

Merge multiple pre-trained BERTopic models into a single model. The models are merged as if they were all saved using pytorch or safetensors, so a minimal version without c-TF-IDF. To do this, we choose the first model in the list of models as a baseline. Then, we c

(cls, models, min_similarity: float = 0.7, embedding_model=None)

Source from the content-addressed store, hash-verified

3588
3589 @classmethod
3590 def merge_models(cls, models, min_similarity: float = 0.7, embedding_model=None):
3591 """Merge multiple pre-trained BERTopic models into a single model.
3592
3593 The models are merged as if they were all saved using pytorch or
3594 safetensors, so a minimal version without c-TF-IDF.
3595
3596 To do this, we choose the first model in the list of
3597 models as a baseline. Then, we check each model whether
3598 they contain topics that are not in the baseline.
3599 This check is based on the cosine similarity between
3600 topics embeddings. If topic embeddings between two models
3601 are similar, then the topic of the second model is re-assigned
3602 to the first. If they are dissimilar, the topic of the second
3603 model is assigned to the first.
3604
3605 In essence, we simply check whether sufficiently "new"
3606 topics emerge and add them.
3607
3608 Arguments:
3609 models: A list of fitted BERTopic models
3610 min_similarity: The minimum similarity for when topics are merged.
3611 embedding_model: Additionally load in an embedding model if necessary.
3612
3613 Returns:
3614 A new BERTopic model that was created as if you were
3615 loading a model from the HuggingFace Hub without c-TF-IDF
3616
3617 Examples:
3618 ```python
3619 from bertopic import BERTopic
3620 from sklearn.datasets import fetch_20newsgroups
3621
3622 docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
3623
3624 # Create three separate models
3625 topic_model_1 = BERTopic(min_topic_size=5).fit(docs[:4000])
3626 topic_model_2 = BERTopic(min_topic_size=5).fit(docs[4000:8000])
3627 topic_model_3 = BERTopic(min_topic_size=5).fit(docs[8000:])
3628
3629 # Combine all models into one
3630 merged_model = BERTopic.merge_models([topic_model_1, topic_model_2, topic_model_3])
3631 ```
3632 """
3633
3634 def choose_backend():
3635 """Choose the backend to use for saving the model."""
3636 try:
3637 import torch # noqa: F401
3638
3639 return "pytorch"
3640 except (ModuleNotFoundError, ImportError):
3641 try:
3642 import safetensors # noqa: F401
3643
3644 return "safetensors"
3645 except (ModuleNotFoundError, ImportError):
3646 raise ImportError(
3647 "Neither pytorch nor safetensors is installed. "

Callers 1

test_full_modelFunction · 0.80

Calls 4

select_backendFunction · 0.90
TopicMapperClass · 0.85
_create_model_from_filesFunction · 0.85
saveMethod · 0.80

Tested by 1

test_full_modelFunction · 0.64