MCPcopy
hub / github.com/MaartenGr/BERTopic / visualize_hierarchy

Function visualize_hierarchy

bertopic/plotting/_hierarchy.py:16–227  ·  view source on GitHub ↗

Visualize a hierarchical structure of the topics. A ward linkage function is used to perform the hierarchical clustering based on the cosine distance matrix between topic embeddings (either c-TF-IDF or the embeddings from the embedding model). Arguments: topic_model: A fitt

(
    topic_model,
    orientation: str = "left",
    topics: List[int] | None = None,
    top_n_topics: int | None = None,
    use_ctfidf: bool = True,
    custom_labels: Union[bool, str] = False,
    title: str = "<b>Hierarchical Clustering</b>",
    width: int = 1000,
    height: int = 600,
    hierarchical_topics: pd.DataFrame = None,
    linkage_function: Callable[[csr_matrix], np.ndarray] | None = None,
    distance_function: Callable[[csr_matrix], csr_matrix] | None = None,
    color_threshold: int = 1,
)

Source from the content-addressed store, hash-verified

14
15
16def visualize_hierarchy(
17 topic_model,
18 orientation: str = "left",
19 topics: List[int] | None = None,
20 top_n_topics: int | None = None,
21 use_ctfidf: bool = True,
22 custom_labels: Union[bool, str] = False,
23 title: str = "<b>Hierarchical Clustering</b>",
24 width: int = 1000,
25 height: int = 600,
26 hierarchical_topics: pd.DataFrame = None,
27 linkage_function: Callable[[csr_matrix], np.ndarray] | None = None,
28 distance_function: Callable[[csr_matrix], csr_matrix] | None = None,
29 color_threshold: int = 1,
30) -> go.Figure:
31 """Visualize a hierarchical structure of the topics.
32
33 A ward linkage function is used to perform the
34 hierarchical clustering based on the cosine distance
35 matrix between topic embeddings (either c-TF-IDF or the embeddings from the embedding model).
36
37 Arguments:
38 topic_model: A fitted BERTopic instance.
39 orientation: The orientation of the figure.
40 Either 'left' or 'bottom'
41 topics: A selection of topics to visualize
42 top_n_topics: Only select the top n most frequent topics
43 use_ctfidf: Whether to calculate distances between topics based on c-TF-IDF embeddings. If False, the embeddings
44 from the embedding model are used.
45 custom_labels: If bool, whether to use custom topic labels that were defined using
46 `topic_model.set_topic_labels`.
47 If `str`, it uses labels from other aspects, e.g., "Aspect1".
48 NOTE: Custom labels are only generated for the original
49 un-merged topics.
50 title: Title of the plot.
51 width: The width of the figure. Only works if orientation is set to 'left'
52 height: The height of the figure. Only works if orientation is set to 'bottom'
53 hierarchical_topics: A dataframe that contains a hierarchy of topics
54 represented by their parents and their children.
55 NOTE: The hierarchical topic names are only visualized
56 if both `topics` and `top_n_topics` are not set.
57 linkage_function: The linkage function to use. Default is:
58 `lambda x: sch.linkage(x, 'ward', optimal_ordering=True)`
59 NOTE: Make sure to use the same `linkage_function` as used
60 in `topic_model.hierarchical_topics`.
61 distance_function: The distance function to use on the c-TF-IDF matrix. Default is:
62 `lambda x: 1 - cosine_similarity(x)`.
63 You can pass any function that returns either a square matrix of
64 shape (n_samples, n_samples) with zeros on the diagonal and
65 non-negative values or condensed distance matrix of shape
66 (n_samples * (n_samples - 1) / 2,) containing the upper
67 triangular of the distance matrix.
68 NOTE: Make sure to use the same `distance_function` as used
69 in `topic_model.hierarchical_topics`.
70 color_threshold: Value at which the separation of clusters will be made which
71 will result in different colors for different clusters.
72 A higher value will typically lead in less colored clusters.
73

Callers

nothing calls this directly

Calls 6

validate_distance_matrixFunction · 0.90
_get_annotationsFunction · 0.85
get_topic_freqMethod · 0.80
get_topicsMethod · 0.80
get_topicMethod · 0.80

Tested by

no test coverage detected