hub / github.com/ResidentMario/missingno / geoplot

Function geoplot

missingno/missingno.py:583–807 · view source on GitHub ↗

Generates a geographical data nullity heatmap, which shows the distribution of missing data across geographic regions. The precise output depends on the inputs provided. In increasing order of usefulness: * If no geographical context is provided, a quadtree is computed and nullities

(df, x=None, y=None, coordinates=None, by=None, geometry=None, cutoff=None, histogram=False,
            figsize=(25, 10), fontsize=8, inline=True)

Source from the content-addressed store, hash-verified

581
582
583	def geoplot(df, x=None, y=None, coordinates=None, by=None, geometry=None, cutoff=None, histogram=False,
584	figsize=(25, 10), fontsize=8, inline=True):
585	"""
586	Generates a geographical data nullity heatmap, which shows the distribution of missing data across geographic
587	regions. The precise output depends on the inputs provided. In increasing order of usefulness:
588
589	* If no geographical context is provided, a quadtree is computed and nullities are rendered as abstract
590	geopgrahical squares.
591	* If geographical context is provided in the form of a column of geographies (region, borough. ZIP code,
592	etc.) in the `DataFrame`, convex hulls are computed for each of the point groups and the heatmap is generated
593	within them.
594	* If geographical context is provided and a separate geometry is provided, a heatmap is generated for each
595	point group within this geograpby instead.
596
597	:param df: The DataFrame whose completeness is being mapped.
598	:param x: The x variable: probably a coordinate (longitude), possibly some other floating point value. May be a
599	string (pointing to a column of df) or an iterable.
600	:param y: The y variable: probably a coordinate (latitude), possibly some other floating point value. May be a
601	string (pointing to a column of df) or an iterable.
602	:param coordinates: A coordinate tuple iterable, or column thereof in the given DataFrame. One of x AND y OR
603	coordinates must be specified, but not both.
604	:param by: If you would like to aggregate your geometry by some geospatial attribute of the underlying DataFrame,
605	name that column here.
606	:param geometry: If you would like to provide your own geometries for your aggregation, instead of relying on
607	(functional, but not pretty) convex hulls, provide them here. This parameter is expected to be a dict or Series
608	of `shapely.Polygon` or `shapely.MultiPolygon` objects. It's ignored if `by` is not specified.
609	:param cutoff: If no aggregation is specified, this parameter sets the minimum number of observations to include in
610	each square. If not provided, set to 50 or 5% of the total size of the dataset, whichever is smaller. If `by` is
611	specified this parameter is ignored.
612	:param figsize: The size of the figure to display. This is a `matplotlib` parameter which defaults to (25, 10).
613	:param histogram: Whether or not to plot a histogram of data distributions below the map. Defaults to False.
614	:param fontsize: If `hist` is specified, this parameter specifies the size of the tick labels. Ignored if `hist`
615	is not specified. Defaults to 8.
616	:param inline: Whether or not the figure is inline. If it's not then instead of getting plotted, this method will
617	return its figure.
618	:return: If `inline` is True, the underlying `matplotlib.figure` object. Else, nothing.
619	"""
620	import shapely.geometry
621	import descartes
622	import matplotlib.cm
623	# We produce a coordinate column in-place in a function-local copy of the `DataFrame`.
624	# This seems specious, and sort of is, but is necessary because the internal `pandas` aggregation methods
625	# (`pd.core.groupby.DataFrameGroupBy.count` specifically) are optimized to run two orders of magnitude faster than
626	# user-defined external `groupby` operations. For example:
627	# >>> %time df.head(100000).groupby(lambda ind: df.iloc[ind]['LOCATION']).count()
628	# Wall time: 12.7 s
629	# >>> %time df.head(100000).groupby('LOCATION').count()
630	# Wall time: 96 ms
631	x_col = '__x'
632	y_col = '__y'
633	if x and y:
634	if isinstance(x, str) and isinstance(y, str):
635	x_col = x
636	y_col = y
637	else:
638	df['__x'] = x
639	df['__y'] = y
640	elif coordinates:

Callers

nothing calls this directly

Calls 2

_calculate_geographic_nullityFunction · 0.85

squarifyFunction · 0.85

Tested by

no test coverage detected