MCPcopy
hub / github.com/ResidentMario/missingno / geoplot

Function geoplot

missingno/missingno.py:583–807  ·  view source on GitHub ↗

Generates a geographical data nullity heatmap, which shows the distribution of missing data across geographic regions. The precise output depends on the inputs provided. In increasing order of usefulness: * If no geographical context is provided, a quadtree is computed and nullities

(df, x=None, y=None, coordinates=None, by=None, geometry=None, cutoff=None, histogram=False,
            figsize=(25, 10), fontsize=8, inline=True)

Source from the content-addressed store, hash-verified

581
582
583def geoplot(df, x=None, y=None, coordinates=None, by=None, geometry=None, cutoff=None, histogram=False,
584 figsize=(25, 10), fontsize=8, inline=True):
585 """
586 Generates a geographical data nullity heatmap, which shows the distribution of missing data across geographic
587 regions. The precise output depends on the inputs provided. In increasing order of usefulness:
588
589 * If no geographical context is provided, a quadtree is computed and nullities are rendered as abstract
590 geopgrahical squares.
591 * If geographical context is provided in the form of a column of geographies (region, borough. ZIP code,
592 etc.) in the `DataFrame`, convex hulls are computed for each of the point groups and the heatmap is generated
593 within them.
594 * If geographical context is provided *and* a separate geometry is provided, a heatmap is generated for each
595 point group within this geograpby instead.
596
597 :param df: The DataFrame whose completeness is being mapped.
598 :param x: The x variable: probably a coordinate (longitude), possibly some other floating point value. May be a
599 string (pointing to a column of df) or an iterable.
600 :param y: The y variable: probably a coordinate (latitude), possibly some other floating point value. May be a
601 string (pointing to a column of df) or an iterable.
602 :param coordinates: A coordinate tuple iterable, or column thereof in the given DataFrame. One of x AND y OR
603 coordinates must be specified, but not both.
604 :param by: If you would like to aggregate your geometry by some geospatial attribute of the underlying DataFrame,
605 name that column here.
606 :param geometry: If you would like to provide your own geometries for your aggregation, instead of relying on
607 (functional, but not pretty) convex hulls, provide them here. This parameter is expected to be a dict or Series
608 of `shapely.Polygon` or `shapely.MultiPolygon` objects. It's ignored if `by` is not specified.
609 :param cutoff: If no aggregation is specified, this parameter sets the minimum number of observations to include in
610 each square. If not provided, set to 50 or 5% of the total size of the dataset, whichever is smaller. If `by` is
611 specified this parameter is ignored.
612 :param figsize: The size of the figure to display. This is a `matplotlib` parameter which defaults to (25, 10).
613 :param histogram: Whether or not to plot a histogram of data distributions below the map. Defaults to False.
614 :param fontsize: If `hist` is specified, this parameter specifies the size of the tick labels. Ignored if `hist`
615 is not specified. Defaults to 8.
616 :param inline: Whether or not the figure is inline. If it's not then instead of getting plotted, this method will
617 return its figure.
618 :return: If `inline` is True, the underlying `matplotlib.figure` object. Else, nothing.
619 """
620 import shapely.geometry
621 import descartes
622 import matplotlib.cm
623 # We produce a coordinate column in-place in a function-local copy of the `DataFrame`.
624 # This seems specious, and sort of is, but is necessary because the internal `pandas` aggregation methods
625 # (`pd.core.groupby.DataFrameGroupBy.count` specifically) are optimized to run two orders of magnitude faster than
626 # user-defined external `groupby` operations. For example:
627 # >>> %time df.head(100000).groupby(lambda ind: df.iloc[ind]['LOCATION']).count()
628 # Wall time: 12.7 s
629 # >>> %time df.head(100000).groupby('LOCATION').count()
630 # Wall time: 96 ms
631 x_col = '__x'
632 y_col = '__y'
633 if x and y:
634 if isinstance(x, str) and isinstance(y, str):
635 x_col = x
636 y_col = y
637 else:
638 df['__x'] = x
639 df['__y'] = y
640 elif coordinates:

Callers

nothing calls this directly

Calls 2

squarifyFunction · 0.85

Tested by

no test coverage detected