Fits a `scipy` hierarchical clustering algorithm to the given DataFrame's variables and visualizes the results as a `scipy` dendrogram. The default vertical display will fit up to 50 columns. If more than 50 columns are specified and orientation is left unspecified the dendrogr
(df, method='average',
filter=None, n=0, p=0, sort=None,
orientation=None, figsize=None,
fontsize=16, inline=True
)
| 465 | |
| 466 | |
| 467 | def dendrogram(df, method='average', |
| 468 | filter=None, n=0, p=0, sort=None, |
| 469 | orientation=None, figsize=None, |
| 470 | fontsize=16, inline=True |
| 471 | ): |
| 472 | """ |
| 473 | Fits a `scipy` hierarchical clustering algorithm to the given DataFrame's variables and visualizes the results as |
| 474 | a `scipy` dendrogram. |
| 475 | |
| 476 | The default vertical display will fit up to 50 columns. If more than 50 columns are specified and orientation is |
| 477 | left unspecified the dendrogram will automatically swap to a horizontal display to fit the additional variables. |
| 478 | |
| 479 | :param df: The DataFrame whose completeness is being dendrogrammed. |
| 480 | :param method: The distance measure being used for clustering. This is a parameter that is passed to |
| 481 | `scipy.hierarchy`. |
| 482 | :param filter: The filter to apply to the heatmap. Should be one of "top", "bottom", or None (default). See |
| 483 | `nullity_filter()` for more information. |
| 484 | :param n: The cap on the number of columns to include in the filtered DataFrame. See `nullity_filter()` for |
| 485 | more information. |
| 486 | :param p: The cap on the percentage fill of the columns in the filtered DataFrame. See `nullity_filter()` for |
| 487 | more information. |
| 488 | :param sort: The sort to apply to the heatmap. Should be one of "ascending", "descending", or None. See |
| 489 | `nullity_sort()` for more information. |
| 490 | :param figsize: The size of the figure to display. This is a `matplotlib` parameter which defaults to `(25, 10)`. |
| 491 | :param fontsize: The figure's font size. |
| 492 | :param orientation: The way the dendrogram is oriented. Defaults to top-down if there are less than or equal to 50 |
| 493 | columns and left-right if there are more. |
| 494 | :param inline: Whether or not the figure is inline. If it's not then instead of getting plotted, this method will |
| 495 | return its figure. |
| 496 | :return: If `inline` is True, the underlying `matplotlib.figure` object. Else, nothing. |
| 497 | """ |
| 498 | # Figure out the appropriate figsize. |
| 499 | if not figsize: |
| 500 | if len(df.columns) <= 50 or orientation == 'top' or orientation == 'bottom': |
| 501 | figsize = (25, 10) |
| 502 | else: |
| 503 | figsize = (25, (25 + len(df.columns) - 50)*0.5) |
| 504 | |
| 505 | # Set up the figure. |
| 506 | fig = plt.figure(figsize=figsize) |
| 507 | gs = gridspec.GridSpec(1, 1) |
| 508 | ax0 = plt.subplot(gs[0]) |
| 509 | |
| 510 | # Apply filters and sorts. |
| 511 | df = nullity_filter(df, filter=filter, n=n, p=p) |
| 512 | df = nullity_sort(df, sort=sort) |
| 513 | |
| 514 | # Link the hierarchical output matrix. |
| 515 | x = np.transpose(df.isnull().astype(int).values) |
| 516 | z = hierarchy.linkage(x, method) |
| 517 | |
| 518 | # Figure out orientation. |
| 519 | if not orientation: |
| 520 | if len(df.columns) > 50: |
| 521 | orientation = 'left' |
| 522 | else: |
| 523 | orientation = 'bottom' |
| 524 |
nothing calls this directly
no test coverage detected