MCPcopy
hub / github.com/ResidentMario/missingno / dendrogram

Function dendrogram

missingno/missingno.py:467–561  ·  view source on GitHub ↗

Fits a `scipy` hierarchical clustering algorithm to the given DataFrame's variables and visualizes the results as a `scipy` dendrogram. The default vertical display will fit up to 50 columns. If more than 50 columns are specified and orientation is left unspecified the dendrogr

(df, method='average',
               filter=None, n=0, p=0, sort=None,
               orientation=None, figsize=None,
               fontsize=16, inline=True
               )

Source from the content-addressed store, hash-verified

465
466
467def dendrogram(df, method='average',
468 filter=None, n=0, p=0, sort=None,
469 orientation=None, figsize=None,
470 fontsize=16, inline=True
471 ):
472 """
473 Fits a `scipy` hierarchical clustering algorithm to the given DataFrame's variables and visualizes the results as
474 a `scipy` dendrogram.
475
476 The default vertical display will fit up to 50 columns. If more than 50 columns are specified and orientation is
477 left unspecified the dendrogram will automatically swap to a horizontal display to fit the additional variables.
478
479 :param df: The DataFrame whose completeness is being dendrogrammed.
480 :param method: The distance measure being used for clustering. This is a parameter that is passed to
481 `scipy.hierarchy`.
482 :param filter: The filter to apply to the heatmap. Should be one of "top", "bottom", or None (default). See
483 `nullity_filter()` for more information.
484 :param n: The cap on the number of columns to include in the filtered DataFrame. See `nullity_filter()` for
485 more information.
486 :param p: The cap on the percentage fill of the columns in the filtered DataFrame. See `nullity_filter()` for
487 more information.
488 :param sort: The sort to apply to the heatmap. Should be one of "ascending", "descending", or None. See
489 `nullity_sort()` for more information.
490 :param figsize: The size of the figure to display. This is a `matplotlib` parameter which defaults to `(25, 10)`.
491 :param fontsize: The figure's font size.
492 :param orientation: The way the dendrogram is oriented. Defaults to top-down if there are less than or equal to 50
493 columns and left-right if there are more.
494 :param inline: Whether or not the figure is inline. If it's not then instead of getting plotted, this method will
495 return its figure.
496 :return: If `inline` is True, the underlying `matplotlib.figure` object. Else, nothing.
497 """
498 # Figure out the appropriate figsize.
499 if not figsize:
500 if len(df.columns) <= 50 or orientation == 'top' or orientation == 'bottom':
501 figsize = (25, 10)
502 else:
503 figsize = (25, (25 + len(df.columns) - 50)*0.5)
504
505 # Set up the figure.
506 fig = plt.figure(figsize=figsize)
507 gs = gridspec.GridSpec(1, 1)
508 ax0 = plt.subplot(gs[0])
509
510 # Apply filters and sorts.
511 df = nullity_filter(df, filter=filter, n=n, p=p)
512 df = nullity_sort(df, sort=sort)
513
514 # Link the hierarchical output matrix.
515 x = np.transpose(df.isnull().astype(int).values)
516 z = hierarchy.linkage(x, method)
517
518 # Figure out orientation.
519 if not orientation:
520 if len(df.columns) > 50:
521 orientation = 'left'
522 else:
523 orientation = 'bottom'
524

Callers

nothing calls this directly

Calls 2

nullity_filterFunction · 0.85
nullity_sortFunction · 0.85

Tested by

no test coverage detected