hub / github.com/MaartenGr/BERTopic / update_topics

Method update_topics

bertopic/_bertopic.py:1488–1596 · view source on GitHub ↗

Updates the topic representation by recalculating c-TF-IDF with the new parameters as defined in this function. When you have trained a model and viewed the topics and the words that represent them, you might not be satisfied with the representation. Perhaps you forgot to re

(
        self,
        docs: List[str],
        images: List[str] | None = None,
        topics: List[int] | None = None,
        top_n_words: int = 10,
        n_gram_range: Tuple[int, int] | None = None,
        vectorizer_model: CountVectorizer = None,
        ctfidf_model: ClassTfidfTransformer = None,
        representation_model: BaseRepresentation = None,
    )

Source from the content-addressed store, hash-verified

1486	return similar_topics, similarity
1487
1488	def update_topics(
1489	self,
1490	docs: List[str],
1491	images: List[str] \| None = None,
1492	topics: List[int] \| None = None,
1493	top_n_words: int = 10,
1494	n_gram_range: Tuple[int, int] \| None = None,
1495	vectorizer_model: CountVectorizer = None,
1496	ctfidf_model: ClassTfidfTransformer = None,
1497	representation_model: BaseRepresentation = None,
1498	):
1499	"""Updates the topic representation by recalculating c-TF-IDF with the new
1500	parameters as defined in this function.
1501
1502	When you have trained a model and viewed the topics and the words that represent them,
1503	you might not be satisfied with the representation. Perhaps you forgot to remove
1504	stop_words or you want to try out a different n_gram_range. This function allows you
1505	to update the topic representation after they have been formed.
1506
1507	Arguments:
1508	docs: The documents you used when calling either `fit` or `fit_transform`
1509	images: The images you used when calling either `fit` or `fit_transform`
1510	topics: A list of topics where each topic is related to a document in `docs`.
1511	Use this variable to change or map the topics.
1512	NOTE: Using a custom list of topic assignments may lead to errors if
1513	topic reduction techniques are used afterwards. Make sure that
1514	manually assigning topics is the last step in the pipeline
1515	top_n_words: The number of words per topic to extract. Setting this
1516	too high can negatively impact topic embeddings as topics
1517	are typically best represented by at most 10 words.
1518	n_gram_range: The n-gram range for the CountVectorizer.
1519	vectorizer_model: Pass in your own CountVectorizer from scikit-learn
1520	ctfidf_model: Pass in your own c-TF-IDF model to update the representations
1521	representation_model: Pass in a model that fine-tunes the topic representations
1522	calculated through c-TF-IDF. Models from `bertopic.representation`
1523	are supported.
1524
1525	Examples:
1526	In order to update the topic representation, you will need to first fit the topic
1527	model and extract topics from them. Based on these, you can update the representation:
1528
1529	```python
1530	topic_model.update_topics(docs, n_gram_range=(2, 3))
1531	```
1532
1533	You can also use a custom vectorizer to update the representation:
1534
1535	```python
1536	from sklearn.feature_extraction.text import CountVectorizer
1537	vectorizer_model = CountVectorizer(ngram_range=(1, 2), stop_words="english")
1538	topic_model.update_topics(docs, vectorizer_model=vectorizer_model)
1539	```
1540
1541	You can also use this function to change or map the topics to something else.
1542	You can update them as follows:
1543
1544	```python
1545	topic_model.update_topics(docs, my_updated_topics)

Callers 3

test_full_modelFunction · 0.80

test_online_cvFunction · 0.80

test_update_topicsFunction · 0.80

Calls 8

_update_topic_sizeMethod · 0.95

_c_tf_idfMethod · 0.95

_extract_words_per_topicMethod · 0.95

_create_topic_vectorsMethod · 0.95

check_documents_typeFunction · 0.90

check_is_fittedFunction · 0.90

ClassTfidfTransformerClass · 0.90

warningMethod · 0.80

Tested by 3

test_full_modelFunction · 0.64

test_online_cvFunction · 0.64

test_update_topicsFunction · 0.64