hub / github.com/MaartenGr/BERTopic / visualize_documents

Function visualize_documents

bertopic/plotting/_documents.py:8–263 · view source on GitHub ↗

Visualize documents and their topics in 2D. Arguments: topic_model: A fitted BERTopic instance. docs: The documents you used when calling either `fit` or `fit_transform` topics: A selection of topics to visualize. Not to be confused with the topics that y

(
    topic_model,
    docs: List[str],
    topics: List[int] | None = None,
    embeddings: np.ndarray = None,
    reduced_embeddings: np.ndarray = None,
    sample: float | None = None,
    hide_annotations: bool = False,
    hide_document_hover: bool = False,
    custom_labels: Union[bool, str] = False,
    title: str = "<b>Documents and Topics</b>",
    width: int = 1200,
    height: int = 750,
)

Source from the content-addressed store, hash-verified

6
7
8	def visualize_documents(
9	topic_model,
10	docs: List[str],
11	topics: List[int] \| None = None,
12	embeddings: np.ndarray = None,
13	reduced_embeddings: np.ndarray = None,
14	sample: float \| None = None,
15	hide_annotations: bool = False,
16	hide_document_hover: bool = False,
17	custom_labels: Union[bool, str] = False,
18	title: str = "<b>Documents and Topics</b>",
19	width: int = 1200,
20	height: int = 750,
21	):
22	"""Visualize documents and their topics in 2D.
23
24	Arguments:
25	topic_model: A fitted BERTopic instance.
26	docs: The documents you used when calling either `fit` or `fit_transform`
27	topics: A selection of topics to visualize.
28	Not to be confused with the topics that you get from `.fit_transform`.
29	For example, if you want to visualize only topics 1 through 5:
30	`topics = [1, 2, 3, 4, 5]`.
31	embeddings: The embeddings of all documents in `docs`.
32	reduced_embeddings: The 2D reduced embeddings of all documents in `docs`.
33	sample: The percentage of documents in each topic that you would like to keep.
34	Value can be between 0 and 1. Setting this value to, for example,
35	0.1 (10% of documents in each topic) makes it easier to visualize
36	millions of documents as a subset is chosen.
37	hide_annotations: Hide the names of the traces on top of each cluster.
38	hide_document_hover: Hide the content of the documents when hovering over
39	specific points. Helps to speed up generation of visualization.
40	custom_labels: If bool, whether to use custom topic labels that were defined using
41	`topic_model.set_topic_labels`.
42	If `str`, it uses labels from other aspects, e.g., "Aspect1".
43	title: Title of the plot.
44	width: The width of the figure.
45	height: The height of the figure.
46
47	Examples:
48	To visualize the topics simply run:
49
50	```python
51	topic_model.visualize_documents(docs)
52	```
53
54	Do note that this re-calculates the embeddings and reduces them to 2D.
55	The advised and preferred pipeline for using this function is as follows:
56
57	```python
58	from sklearn.datasets import fetch_20newsgroups
59	from sentence_transformers import SentenceTransformer
60	from bertopic import BERTopic
61	from umap import UMAP
62
63	# Prepare embeddings
64	docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
65	sentence_model = SentenceTransformer("all-MiniLM-L6-v2")

Callers

nothing calls this directly

Calls 3

get_topicMethod · 0.80

_extract_embeddingsMethod · 0.45

fitMethod · 0.45

Tested by

no test coverage detected