Keep track of Topic Mappings. The number of topics can be reduced by merging them together. This mapping needs to be tracked in BERTopic as new predictions need to be mapped to the new topics. These mappings are tracked in the `self.mappings_` attribute where each set o
| 4885 | |
| 4886 | |
| 4887 | class TopicMapper: |
| 4888 | """Keep track of Topic Mappings. |
| 4889 | |
| 4890 | The number of topics can be reduced |
| 4891 | by merging them together. This mapping |
| 4892 | needs to be tracked in BERTopic as new |
| 4893 | predictions need to be mapped to the new |
| 4894 | topics. |
| 4895 | |
| 4896 | These mappings are tracked in the `self.mappings_` |
| 4897 | attribute where each set of topic is stacked horizontally. |
| 4898 | For example, the most recent topics can be found in the |
| 4899 | last column. To get a mapping, simply take two columns |
| 4900 | of topics. |
| 4901 | |
| 4902 | In other words, it is represented as graph: |
| 4903 | Topic 1 --> Topic 11 --> Topic 4 --> etc. |
| 4904 | |
| 4905 | Attributes: |
| 4906 | self.mappings_ (np.ndarray) : A matrix indicating the mappings from one topic |
| 4907 | to another. The columns represent a collection of topics |
| 4908 | at any time. The last column represents the current state |
| 4909 | of topics and the first column represents the initial state |
| 4910 | of topics. |
| 4911 | """ |
| 4912 | |
| 4913 | def __init__(self, topics: List[int]): |
| 4914 | """Initialization of Topic Mapper. |
| 4915 | |
| 4916 | Arguments: |
| 4917 | topics: A list of topics per document |
| 4918 | """ |
| 4919 | base_topics = np.array(sorted(set(topics))) |
| 4920 | topics = base_topics.copy().reshape(-1, 1) |
| 4921 | self.mappings_ = np.hstack([topics.copy(), topics.copy()]).tolist() |
| 4922 | |
| 4923 | def get_mappings(self, original_topics: bool = True) -> Mapping[int, int]: |
| 4924 | """Get mappings from either the original topics or |
| 4925 | the second-most recent topics to the current topics. |
| 4926 | |
| 4927 | Arguments: |
| 4928 | original_topics: Whether we want to map from the |
| 4929 | original topics to the most recent topics |
| 4930 | or from the second-most recent topics. |
| 4931 | |
| 4932 | Returns: |
| 4933 | mappings: The mappings from old topics to new topics |
| 4934 | |
| 4935 | Examples: |
| 4936 | To get mappings, simply call: |
| 4937 | ```python |
| 4938 | mapper = TopicMapper(topics) |
| 4939 | mappings = mapper.get_mappings(original_topics=False) |
| 4940 | ``` |
| 4941 | """ |
| 4942 | if original_topics: |
| 4943 | mappings = np.array(self.mappings_)[:, [0, -1]] |
| 4944 | mappings = dict(zip(mappings[:, 0], mappings[:, 1])) |
no outgoing calls
no test coverage detected