MCPcopy
hub / github.com/MaartenGr/BERTopic / visualize_documents

Function visualize_documents

bertopic/plotting/_documents.py:8–263  ·  view source on GitHub ↗

Visualize documents and their topics in 2D. Arguments: topic_model: A fitted BERTopic instance. docs: The documents you used when calling either `fit` or `fit_transform` topics: A selection of topics to visualize. Not to be confused with the topics that y

(
    topic_model,
    docs: List[str],
    topics: List[int] | None = None,
    embeddings: np.ndarray = None,
    reduced_embeddings: np.ndarray = None,
    sample: float | None = None,
    hide_annotations: bool = False,
    hide_document_hover: bool = False,
    custom_labels: Union[bool, str] = False,
    title: str = "<b>Documents and Topics</b>",
    width: int = 1200,
    height: int = 750,
)

Source from the content-addressed store, hash-verified

6
7
8def visualize_documents(
9 topic_model,
10 docs: List[str],
11 topics: List[int] | None = None,
12 embeddings: np.ndarray = None,
13 reduced_embeddings: np.ndarray = None,
14 sample: float | None = None,
15 hide_annotations: bool = False,
16 hide_document_hover: bool = False,
17 custom_labels: Union[bool, str] = False,
18 title: str = "<b>Documents and Topics</b>",
19 width: int = 1200,
20 height: int = 750,
21):
22 """Visualize documents and their topics in 2D.
23
24 Arguments:
25 topic_model: A fitted BERTopic instance.
26 docs: The documents you used when calling either `fit` or `fit_transform`
27 topics: A selection of topics to visualize.
28 Not to be confused with the topics that you get from `.fit_transform`.
29 For example, if you want to visualize only topics 1 through 5:
30 `topics = [1, 2, 3, 4, 5]`.
31 embeddings: The embeddings of all documents in `docs`.
32 reduced_embeddings: The 2D reduced embeddings of all documents in `docs`.
33 sample: The percentage of documents in each topic that you would like to keep.
34 Value can be between 0 and 1. Setting this value to, for example,
35 0.1 (10% of documents in each topic) makes it easier to visualize
36 millions of documents as a subset is chosen.
37 hide_annotations: Hide the names of the traces on top of each cluster.
38 hide_document_hover: Hide the content of the documents when hovering over
39 specific points. Helps to speed up generation of visualization.
40 custom_labels: If bool, whether to use custom topic labels that were defined using
41 `topic_model.set_topic_labels`.
42 If `str`, it uses labels from other aspects, e.g., "Aspect1".
43 title: Title of the plot.
44 width: The width of the figure.
45 height: The height of the figure.
46
47 Examples:
48 To visualize the topics simply run:
49
50 ```python
51 topic_model.visualize_documents(docs)
52 ```
53
54 Do note that this re-calculates the embeddings and reduces them to 2D.
55 The advised and preferred pipeline for using this function is as follows:
56
57 ```python
58 from sklearn.datasets import fetch_20newsgroups
59 from sentence_transformers import SentenceTransformer
60 from bertopic import BERTopic
61 from umap import UMAP
62
63 # Prepare embeddings
64 docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
65 sentence_model = SentenceTransformer("all-MiniLM-L6-v2")

Callers

nothing calls this directly

Calls 3

get_topicMethod · 0.80
_extract_embeddingsMethod · 0.45
fitMethod · 0.45

Tested by

no test coverage detected