hub / github.com/MaartenGr/BERTopic / transform

Method transform

bertopic/_bertopic.py:545–647 · view source on GitHub ↗

After having fit a model, use transform to predict new instances. Arguments: documents: A single document or a list of documents to predict on embeddings: Pre-trained document embeddings. These can be used instead of the sentence-transformer m

(
        self,
        documents: Union[str, List[str]],
        embeddings: np.ndarray = None,
        images: List[str] | None = None,
    )

Source from the content-addressed store, hash-verified

543	return predictions, self.probabilities_
544
545	def transform(
546	self,
547	documents: Union[str, List[str]],
548	embeddings: np.ndarray = None,
549	images: List[str] \| None = None,
550	) -> Tuple[List[int], np.ndarray]:
551	"""After having fit a model, use transform to predict new instances.
552
553	Arguments:
554	documents: A single document or a list of documents to predict on
555	embeddings: Pre-trained document embeddings. These can be used
556	instead of the sentence-transformer model.
557	images: A list of paths to the images to predict on or the images themselves
558
559	Returns:
560	predictions: Topic predictions for each documents
561	probabilities: The topic probability distribution which is returned by default.
562	If `calculate_probabilities` in BERTopic is set to False, then the
563	probabilities are not calculated to speed up computation and
564	decrease memory usage.
565
566	Examples:
567	```python
568	from bertopic import BERTopic
569	from sklearn.datasets import fetch_20newsgroups
570
571	docs = fetch_20newsgroups(subset='all')['data']
572	topic_model = BERTopic().fit(docs)
573	topics, probs = topic_model.transform(docs)
574	```
575
576	If you want to use your own embeddings:
577
578	```python
579	from bertopic import BERTopic
580	from sklearn.datasets import fetch_20newsgroups
581	from sentence_transformers import SentenceTransformer
582
583	# Create embeddings
584	docs = fetch_20newsgroups(subset='all')['data']
585	sentence_model = SentenceTransformer("all-MiniLM-L6-v2")
586	embeddings = sentence_model.encode(docs, show_progress_bar=True)
587
588	# Create topic model
589	topic_model = BERTopic().fit(docs, embeddings)
590	topics, probs = topic_model.transform(docs, embeddings)
591	```
592	"""
593	check_is_fitted(self)
594	check_embeddings_shape(embeddings, documents)
595
596	if isinstance(documents, str) or documents is None:
597	documents = [documents]
598
599	if embeddings is None:
600	embeddings = self._extract_embeddings(documents, images=images, method="document", verbose=self.verbose)
601
602	# Check if an embedding model was found

Callers 10

test_full_modelFunction · 0.45

test_ctfidfFunction · 0.45

test_ctfidf_custom_cvFunction · 0.45

fit_transformMethod · 0.45

hierarchical_topicsMethod · 0.45

approximate_distributionMethod · 0.45

reduce_outliersMethod · 0.45

_reduce_dimensionalityMethod · 0.45

_extract_representative_docsMethod · 0.45

_c_tf_idfMethod · 0.45

Calls 8

_extract_embeddingsMethod · 0.95

_map_probabilitiesMethod · 0.95

_map_predictionsMethod · 0.95

check_is_fittedFunction · 0.90

check_embeddings_shapeFunction · 0.90

is_supported_hdbscanFunction · 0.90

hdbscan_delegatorFunction · 0.90

infoMethod · 0.80

Tested by 3

test_full_modelFunction · 0.36

test_ctfidfFunction · 0.36

test_ctfidf_custom_cvFunction · 0.36