hub / github.com/MaartenGr/BERTopic / fit

Method fit

bertopic/_bertopic.py:350–393 · view source on GitHub ↗

Fit the models on a collection of documents and generate topics. Arguments: documents: A list of documents to fit on embeddings: Pre-trained document embeddings. These can be used instead of the sentence-transformer model images: A

(
        self,
        documents: List[str],
        embeddings: np.ndarray = None,
        images: List[str] | None = None,
        y: Union[List[int], np.ndarray] = None,
    )

Source from the content-addressed store, hash-verified

348	return topic_labels
349
350	def fit(
351	self,
352	documents: List[str],
353	embeddings: np.ndarray = None,
354	images: List[str] \| None = None,
355	y: Union[List[int], np.ndarray] = None,
356	):
357	"""Fit the models on a collection of documents and generate topics.
358
359	Arguments:
360	documents: A list of documents to fit on
361	embeddings: Pre-trained document embeddings. These can be used
362	instead of the sentence-transformer model
363	images: A list of paths to the images to fit on or the images themselves
364	y: The target class for (semi)-supervised modeling. Use -1 if no class for a
365	specific instance is specified.
366
367	Examples:
368	```python
369	from bertopic import BERTopic
370	from sklearn.datasets import fetch_20newsgroups
371
372	docs = fetch_20newsgroups(subset='all')['data']
373	topic_model = BERTopic().fit(docs)
374	```
375
376	If you want to use your own embeddings, use it as follows:
377
378	```python
379	from bertopic import BERTopic
380	from sklearn.datasets import fetch_20newsgroups
381	from sentence_transformers import SentenceTransformer
382
383	# Create embeddings
384	docs = fetch_20newsgroups(subset='all')['data']
385	sentence_model = SentenceTransformer("all-MiniLM-L6-v2")
386	embeddings = sentence_model.encode(docs, show_progress_bar=True)
387
388	# Create topic model
389	topic_model = BERTopic().fit(docs, embeddings)
390	```
391	"""
392	self.fit_transform(documents=documents, embeddings=embeddings, y=y, images=images)
393	return self
394
395	def fit_transform(
396	self,

Callers 15

test_no_plotlyFunction · 0.95

base_topic_modelFunction · 0.95

zeroshot_topic_modelFunction · 0.95

cuml_base_topic_modelFunction · 0.95

test_extract_incorrect_embeddingsFunction · 0.95

custom_topic_modelFunction · 0.45

representation_topic_modelFunction · 0.45

kmeans_pca_topic_modelFunction · 0.45

supervised_topic_modelFunction · 0.45

test_ctfidfFunction · 0.45

test_ctfidf_custom_cvFunction · 0.45

_reduce_dimensionalityMethod · 0.45

Calls 1

fit_transformMethod · 0.95

Tested by 11

test_no_plotlyFunction · 0.76

base_topic_modelFunction · 0.76

zeroshot_topic_modelFunction · 0.76

cuml_base_topic_modelFunction · 0.76

test_extract_incorrect_embeddingsFunction · 0.76

custom_topic_modelFunction · 0.36

representation_topic_modelFunction · 0.36

kmeans_pca_topic_modelFunction · 0.36

supervised_topic_modelFunction · 0.36

test_ctfidfFunction · 0.36

test_ctfidf_custom_cvFunction · 0.36