MCPcopy
hub / github.com/MaartenGr/BERTopic / TopicMapper

Class TopicMapper

bertopic/_bertopic.py:4887–5015  ·  view source on GitHub ↗

Keep track of Topic Mappings. The number of topics can be reduced by merging them together. This mapping needs to be tracked in BERTopic as new predictions need to be mapped to the new topics. These mappings are tracked in the `self.mappings_` attribute where each set o

Source from the content-addressed store, hash-verified

4885
4886
4887class TopicMapper:
4888 """Keep track of Topic Mappings.
4889
4890 The number of topics can be reduced
4891 by merging them together. This mapping
4892 needs to be tracked in BERTopic as new
4893 predictions need to be mapped to the new
4894 topics.
4895
4896 These mappings are tracked in the `self.mappings_`
4897 attribute where each set of topic is stacked horizontally.
4898 For example, the most recent topics can be found in the
4899 last column. To get a mapping, simply take two columns
4900 of topics.
4901
4902 In other words, it is represented as graph:
4903 Topic 1 --> Topic 11 --> Topic 4 --> etc.
4904
4905 Attributes:
4906 self.mappings_ (np.ndarray) : A matrix indicating the mappings from one topic
4907 to another. The columns represent a collection of topics
4908 at any time. The last column represents the current state
4909 of topics and the first column represents the initial state
4910 of topics.
4911 """
4912
4913 def __init__(self, topics: List[int]):
4914 """Initialization of Topic Mapper.
4915
4916 Arguments:
4917 topics: A list of topics per document
4918 """
4919 base_topics = np.array(sorted(set(topics)))
4920 topics = base_topics.copy().reshape(-1, 1)
4921 self.mappings_ = np.hstack([topics.copy(), topics.copy()]).tolist()
4922
4923 def get_mappings(self, original_topics: bool = True) -> Mapping[int, int]:
4924 """Get mappings from either the original topics or
4925 the second-most recent topics to the current topics.
4926
4927 Arguments:
4928 original_topics: Whether we want to map from the
4929 original topics to the most recent topics
4930 or from the second-most recent topics.
4931
4932 Returns:
4933 mappings: The mappings from old topics to new topics
4934
4935 Examples:
4936 To get mappings, simply call:
4937 ```python
4938 mapper = TopicMapper(topics)
4939 mappings = mapper.get_mappings(original_topics=False)
4940 ```
4941 """
4942 if original_topics:
4943 mappings = np.array(self.mappings_)[:, [0, -1]]
4944 mappings = dict(zip(mappings[:, 0], mappings[:, 1]))

Callers 6

partial_fitMethod · 0.85
merge_modelsMethod · 0.85
_cluster_embeddingsMethod · 0.85
_create_model_from_filesFunction · 0.85

Calls

no outgoing calls

Tested by

no test coverage detected