Visualize a hierarchical structure of the topics. A ward linkage function is used to perform the hierarchical clustering based on the cosine distance matrix between topic embeddings (either c-TF-IDF or the embeddings from the embedding model). Arguments: topic_model: A fitt
(
topic_model,
orientation: str = "left",
topics: List[int] | None = None,
top_n_topics: int | None = None,
use_ctfidf: bool = True,
custom_labels: Union[bool, str] = False,
title: str = "<b>Hierarchical Clustering</b>",
width: int = 1000,
height: int = 600,
hierarchical_topics: pd.DataFrame = None,
linkage_function: Callable[[csr_matrix], np.ndarray] | None = None,
distance_function: Callable[[csr_matrix], csr_matrix] | None = None,
color_threshold: int = 1,
)
| 14 | |
| 15 | |
| 16 | def visualize_hierarchy( |
| 17 | topic_model, |
| 18 | orientation: str = "left", |
| 19 | topics: List[int] | None = None, |
| 20 | top_n_topics: int | None = None, |
| 21 | use_ctfidf: bool = True, |
| 22 | custom_labels: Union[bool, str] = False, |
| 23 | title: str = "<b>Hierarchical Clustering</b>", |
| 24 | width: int = 1000, |
| 25 | height: int = 600, |
| 26 | hierarchical_topics: pd.DataFrame = None, |
| 27 | linkage_function: Callable[[csr_matrix], np.ndarray] | None = None, |
| 28 | distance_function: Callable[[csr_matrix], csr_matrix] | None = None, |
| 29 | color_threshold: int = 1, |
| 30 | ) -> go.Figure: |
| 31 | """Visualize a hierarchical structure of the topics. |
| 32 | |
| 33 | A ward linkage function is used to perform the |
| 34 | hierarchical clustering based on the cosine distance |
| 35 | matrix between topic embeddings (either c-TF-IDF or the embeddings from the embedding model). |
| 36 | |
| 37 | Arguments: |
| 38 | topic_model: A fitted BERTopic instance. |
| 39 | orientation: The orientation of the figure. |
| 40 | Either 'left' or 'bottom' |
| 41 | topics: A selection of topics to visualize |
| 42 | top_n_topics: Only select the top n most frequent topics |
| 43 | use_ctfidf: Whether to calculate distances between topics based on c-TF-IDF embeddings. If False, the embeddings |
| 44 | from the embedding model are used. |
| 45 | custom_labels: If bool, whether to use custom topic labels that were defined using |
| 46 | `topic_model.set_topic_labels`. |
| 47 | If `str`, it uses labels from other aspects, e.g., "Aspect1". |
| 48 | NOTE: Custom labels are only generated for the original |
| 49 | un-merged topics. |
| 50 | title: Title of the plot. |
| 51 | width: The width of the figure. Only works if orientation is set to 'left' |
| 52 | height: The height of the figure. Only works if orientation is set to 'bottom' |
| 53 | hierarchical_topics: A dataframe that contains a hierarchy of topics |
| 54 | represented by their parents and their children. |
| 55 | NOTE: The hierarchical topic names are only visualized |
| 56 | if both `topics` and `top_n_topics` are not set. |
| 57 | linkage_function: The linkage function to use. Default is: |
| 58 | `lambda x: sch.linkage(x, 'ward', optimal_ordering=True)` |
| 59 | NOTE: Make sure to use the same `linkage_function` as used |
| 60 | in `topic_model.hierarchical_topics`. |
| 61 | distance_function: The distance function to use on the c-TF-IDF matrix. Default is: |
| 62 | `lambda x: 1 - cosine_similarity(x)`. |
| 63 | You can pass any function that returns either a square matrix of |
| 64 | shape (n_samples, n_samples) with zeros on the diagonal and |
| 65 | non-negative values or condensed distance matrix of shape |
| 66 | (n_samples * (n_samples - 1) / 2,) containing the upper |
| 67 | triangular of the distance matrix. |
| 68 | NOTE: Make sure to use the same `distance_function` as used |
| 69 | in `topic_model.hierarchical_topics`. |
| 70 | color_threshold: Value at which the separation of clusters will be made which |
| 71 | will result in different colors for different clusters. |
| 72 | A higher value will typically lead in less colored clusters. |
| 73 |
nothing calls this directly
no test coverage detected