hub / github.com/dask/dask / _normalize_spec

Function _normalize_spec

dask/dataframe/groupby.py:613–696 · view source on GitHub ↗

Return a list of ``(result_column, func, input_column)`` tuples. Spec can be - a function - a list of functions - a dictionary that maps input-columns to functions - a dictionary that maps input-columns to a lists of functions - a dictionary that maps input-columns to

(spec, non_group_columns)

Source from the content-addressed store, hash-verified

611
612
613	def _normalize_spec(spec, non_group_columns):
614	"""
615	Return a list of ``(result_column, func, input_column)`` tuples.
616
617	Spec can be
618
619	- a function
620	- a list of functions
621	- a dictionary that maps input-columns to functions
622	- a dictionary that maps input-columns to a lists of functions
623	- a dictionary that maps input-columns to a dictionaries that map
624	output-columns to functions.
625
626	The non-group columns are a list of all column names that are not used in
627	the groupby operation.
628
629	Usually, the result columns are multi-level names, returned as tuples.
630	If only a single function is supplied or dictionary mapping columns
631	to single functions, simple names are returned as strings (see the first
632	two examples below).
633
634	Examples
635	--------
636	>>> _normalize_spec('mean', ['a', 'b', 'c'])
637	[('a', 'mean', 'a'), ('b', 'mean', 'b'), ('c', 'mean', 'c')]
638
639	>>> spec = collections.OrderedDict([('a', 'mean'), ('b', 'count')])
640	>>> _normalize_spec(spec, ['a', 'b', 'c'])
641	[('a', 'mean', 'a'), ('b', 'count', 'b')]
642
643	>>> _normalize_spec(['var', 'mean'], ['a', 'b', 'c'])
644	... # doctest: +NORMALIZE_WHITESPACE
645	[(('a', 'var'), 'var', 'a'), (('a', 'mean'), 'mean', 'a'), \
646	(('b', 'var'), 'var', 'b'), (('b', 'mean'), 'mean', 'b'), \
647	(('c', 'var'), 'var', 'c'), (('c', 'mean'), 'mean', 'c')]
648
649	>>> spec = collections.OrderedDict([('a', 'mean'), ('b', ['sum', 'count'])])
650	>>> _normalize_spec(spec, ['a', 'b', 'c'])
651	... # doctest: +NORMALIZE_WHITESPACE
652	[(('a', 'mean'), 'mean', 'a'), (('b', 'sum'), 'sum', 'b'), \
653	(('b', 'count'), 'count', 'b')]
654
655	>>> spec = collections.OrderedDict()
656	>>> spec['a'] = ['mean', 'size']
657	>>> spec['b'] = collections.OrderedDict([('e', 'count'), ('f', 'var')])
658	>>> _normalize_spec(spec, ['a', 'b', 'c'])
659	... # doctest: +NORMALIZE_WHITESPACE
660	[(('a', 'mean'), 'mean', 'a'), (('a', 'size'), 'size', 'a'), \
661	(('b', 'e'), 'count', 'b'), (('b', 'f'), 'var', 'b')]
662	"""
663	if not isinstance(spec, dict):
664	spec = collections.OrderedDict(zip(non_group_columns, it.repeat(spec)))
665
666	res = []
667
668	if isinstance(spec, dict):
669	for input_column, subspec in spec.items():
670	if isinstance(subspec, dict):

Callers 1

specMethod · 0.90

Calls 5

funcnameFunction · 0.90

anyFunction · 0.85

repeatMethod · 0.80

itemsMethod · 0.45

valuesMethod · 0.45

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…