MCPcopy
hub / github.com/dask/dask / map_partitions

Method map_partitions

dask/dataframe/dask_expr/_collection.py:993–1135  ·  view source on GitHub ↗

Apply a Python function to each partition Parameters ---------- func : function Function applied to each partition. args, kwargs : Arguments and keywords to pass to the function. Arguments and keywords may contain ``FrameBase`` or

(
        self,
        func,
        *args,
        meta=no_default,
        enforce_metadata=True,
        transform_divisions=True,
        clear_divisions=False,
        align_dataframes=False,
        parent_meta=None,
        required_columns=None,
        **kwargs,
    )

Source from the content-addressed store, hash-verified

991
992 @insert_meta_param_description(pad=12)
993 def map_partitions(
994 self,
995 func,
996 *args,
997 meta=no_default,
998 enforce_metadata=True,
999 transform_divisions=True,
1000 clear_divisions=False,
1001 align_dataframes=False,
1002 parent_meta=None,
1003 required_columns=None,
1004 **kwargs,
1005 ):
1006 """Apply a Python function to each partition
1007
1008 Parameters
1009 ----------
1010 func : function
1011 Function applied to each partition.
1012 args, kwargs :
1013 Arguments and keywords to pass to the function. Arguments and
1014 keywords may contain ``FrameBase`` or regular python objects.
1015 DataFrame-like args (both dask and pandas) must have the same
1016 number of partitions as ``self`` or comprise a single partition.
1017 Key-word arguments, Single-partition arguments, and general
1018 python-object arguments will be broadcasted to all partitions.
1019 enforce_metadata : bool, default True
1020 Whether to enforce at runtime that the structure of the DataFrame
1021 produced by ``func`` actually matches the structure of ``meta``.
1022 This will rename and reorder columns for each partition, and will
1023 raise an error if this doesn't work, but it won't raise if dtypes
1024 don't match.
1025 transform_divisions : bool, default True
1026 Whether to apply the function onto the divisions and apply those
1027 transformed divisions to the output.
1028 clear_divisions : bool, default False
1029 Whether divisions should be cleared. If True, `transform_divisions`
1030 will be ignored.
1031 required_columns : list or None, default None
1032 List of columns that ``func`` requires for execution. These columns
1033 must belong to the first DataFrame argument (in ``args``). If None
1034 is specified (the default), the query optimizer will assume that
1035 all input columns are required.
1036 $META
1037
1038 Examples
1039 --------
1040 Given a DataFrame, Series, or Index, such as:
1041
1042 >>> import pandas as pd
1043 >>> import dask.dataframe as dd
1044 >>> df = pd.DataFrame({'x': [1, 2, 3, 4, 5],
1045 ... 'y': [1., 2., 3., 4., 5.]})
1046 >>> ddf = dd.from_pandas(df, npartitions=2)
1047
1048 One can use ``map_partitions`` to apply a function on each partition.
1049 Extra arguments and keywords can optionally be provided, and will be
1050 passed to the function after the partition.

Callers 15

to_dask_arrayMethod · 0.95
valuesMethod · 0.95
sumMethod · 0.95
prodMethod · 0.95
skewMethod · 0.95
semMethod · 0.95
meanMethod · 0.95
maxMethod · 0.95
anyMethod · 0.95
allMethod · 0.95
idxminMethod · 0.95
idxmaxMethod · 0.95

Calls 1

map_partitionsFunction · 0.70

Tested by 15

test_from_delayed_daskFunction · 0.36
test_from_delayed_fusionFunction · 0.36
test_disk_shuffleFunction · 0.36
test_task_shuffleFunction · 0.36
test_task_shuffle_indexFunction · 0.36
test_merge_empty_left_dfFunction · 0.36
test_map_partitionsFunction · 0.36