Visualize documents and their topics in 2D. Arguments: topic_model: A fitted BERTopic instance. docs: The documents you used when calling either `fit` or `fit_transform` topics: A selection of topics to visualize. Not to be confused with the topics that y
(
topic_model,
docs: List[str],
topics: List[int] | None = None,
embeddings: np.ndarray = None,
reduced_embeddings: np.ndarray = None,
sample: float | None = None,
hide_annotations: bool = False,
hide_document_hover: bool = False,
custom_labels: Union[bool, str] = False,
title: str = "<b>Documents and Topics</b>",
width: int = 1200,
height: int = 750,
)
| 6 | |
| 7 | |
| 8 | def visualize_documents( |
| 9 | topic_model, |
| 10 | docs: List[str], |
| 11 | topics: List[int] | None = None, |
| 12 | embeddings: np.ndarray = None, |
| 13 | reduced_embeddings: np.ndarray = None, |
| 14 | sample: float | None = None, |
| 15 | hide_annotations: bool = False, |
| 16 | hide_document_hover: bool = False, |
| 17 | custom_labels: Union[bool, str] = False, |
| 18 | title: str = "<b>Documents and Topics</b>", |
| 19 | width: int = 1200, |
| 20 | height: int = 750, |
| 21 | ): |
| 22 | """Visualize documents and their topics in 2D. |
| 23 | |
| 24 | Arguments: |
| 25 | topic_model: A fitted BERTopic instance. |
| 26 | docs: The documents you used when calling either `fit` or `fit_transform` |
| 27 | topics: A selection of topics to visualize. |
| 28 | Not to be confused with the topics that you get from `.fit_transform`. |
| 29 | For example, if you want to visualize only topics 1 through 5: |
| 30 | `topics = [1, 2, 3, 4, 5]`. |
| 31 | embeddings: The embeddings of all documents in `docs`. |
| 32 | reduced_embeddings: The 2D reduced embeddings of all documents in `docs`. |
| 33 | sample: The percentage of documents in each topic that you would like to keep. |
| 34 | Value can be between 0 and 1. Setting this value to, for example, |
| 35 | 0.1 (10% of documents in each topic) makes it easier to visualize |
| 36 | millions of documents as a subset is chosen. |
| 37 | hide_annotations: Hide the names of the traces on top of each cluster. |
| 38 | hide_document_hover: Hide the content of the documents when hovering over |
| 39 | specific points. Helps to speed up generation of visualization. |
| 40 | custom_labels: If bool, whether to use custom topic labels that were defined using |
| 41 | `topic_model.set_topic_labels`. |
| 42 | If `str`, it uses labels from other aspects, e.g., "Aspect1". |
| 43 | title: Title of the plot. |
| 44 | width: The width of the figure. |
| 45 | height: The height of the figure. |
| 46 | |
| 47 | Examples: |
| 48 | To visualize the topics simply run: |
| 49 | |
| 50 | ```python |
| 51 | topic_model.visualize_documents(docs) |
| 52 | ``` |
| 53 | |
| 54 | Do note that this re-calculates the embeddings and reduces them to 2D. |
| 55 | The advised and preferred pipeline for using this function is as follows: |
| 56 | |
| 57 | ```python |
| 58 | from sklearn.datasets import fetch_20newsgroups |
| 59 | from sentence_transformers import SentenceTransformer |
| 60 | from bertopic import BERTopic |
| 61 | from umap import UMAP |
| 62 | |
| 63 | # Prepare embeddings |
| 64 | docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data'] |
| 65 | sentence_model = SentenceTransformer("all-MiniLM-L6-v2") |
nothing calls this directly
no test coverage detected