hub / github.com/dask/dask / map_partitions

Method map_partitions

dask/dataframe/dask_expr/_collection.py:993–1135 · view source on GitHub ↗

Apply a Python function to each partition Parameters ---------- func : function Function applied to each partition. args, kwargs : Arguments and keywords to pass to the function. Arguments and keywords may contain ``FrameBase`` or

(
        self,
        func,
        *args,
        meta=no_default,
        enforce_metadata=True,
        transform_divisions=True,
        clear_divisions=False,
        align_dataframes=False,
        parent_meta=None,
        required_columns=None,
        **kwargs,
    )

Source from the content-addressed store, hash-verified

991
992	@insert_meta_param_description(pad=12)
993	def map_partitions(
994	self,
995	func,
996	*args,
997	meta=no_default,
998	enforce_metadata=True,
999	transform_divisions=True,
1000	clear_divisions=False,
1001	align_dataframes=False,
1002	parent_meta=None,
1003	required_columns=None,
1004	**kwargs,
1005	):
1006	"""Apply a Python function to each partition
1007
1008	Parameters
1009	----------
1010	func : function
1011	Function applied to each partition.
1012	args, kwargs :
1013	Arguments and keywords to pass to the function. Arguments and
1014	keywords may contain ``FrameBase`` or regular python objects.
1015	DataFrame-like args (both dask and pandas) must have the same
1016	number of partitions as ``self`` or comprise a single partition.
1017	Key-word arguments, Single-partition arguments, and general
1018	python-object arguments will be broadcasted to all partitions.
1019	enforce_metadata : bool, default True
1020	Whether to enforce at runtime that the structure of the DataFrame
1021	produced by ``func`` actually matches the structure of ``meta``.
1022	This will rename and reorder columns for each partition, and will
1023	raise an error if this doesn't work, but it won't raise if dtypes
1024	don't match.
1025	transform_divisions : bool, default True
1026	Whether to apply the function onto the divisions and apply those
1027	transformed divisions to the output.
1028	clear_divisions : bool, default False
1029	Whether divisions should be cleared. If True, `transform_divisions`
1030	will be ignored.
1031	required_columns : list or None, default None
1032	List of columns that ``func`` requires for execution. These columns
1033	must belong to the first DataFrame argument (in ``args``). If None
1034	is specified (the default), the query optimizer will assume that
1035	all input columns are required.
1036	$META
1037
1038	Examples
1039	--------
1040	Given a DataFrame, Series, or Index, such as:
1041
1042	>>> import pandas as pd
1043	>>> import dask.dataframe as dd
1044	>>> df = pd.DataFrame({'x': [1, 2, 3, 4, 5],
1045	... 'y': [1., 2., 3., 4., 5.]})
1046	>>> ddf = dd.from_pandas(df, npartitions=2)
1047
1048	One can use ``map_partitions`` to apply a function on each partition.
1049	Extra arguments and keywords can optionally be provided, and will be
1050	passed to the function after the partition.

Callers 15

to_dask_arrayMethod · 0.95

valuesMethod · 0.95

sumMethod · 0.95

prodMethod · 0.95

skewMethod · 0.95

semMethod · 0.95

meanMethod · 0.95

maxMethod · 0.95

anyMethod · 0.95

allMethod · 0.95

idxminMethod · 0.95

idxmaxMethod · 0.95

Calls 1

map_partitionsFunction · 0.70

Tested by 15

test_tune_optimization_disabledFunction · 0.36

test_map_partitions_assign_fusedioFunction · 0.36

test_tune_optimization_disabled_from_mapFunction · 0.36

test_from_delayed_daskFunction · 0.36

test_from_delayed_fusionFunction · 0.36

test_disk_shuffleFunction · 0.36

test_task_shuffleFunction · 0.36

test_task_shuffle_indexFunction · 0.36

test_merge_empty_left_dfFunction · 0.36

test_map_partitionsFunction · 0.36

test_map_partitions_broadcastFunction · 0.36

test_map_partitions_mergeFunction · 0.36