Apply a Python function to each partition Parameters ---------- func : function Function applied to each partition. args, kwargs : Arguments and keywords to pass to the function. Arguments and keywords may contain ``FrameBase`` or
(
self,
func,
*args,
meta=no_default,
enforce_metadata=True,
transform_divisions=True,
clear_divisions=False,
align_dataframes=False,
parent_meta=None,
required_columns=None,
**kwargs,
)
| 991 | |
| 992 | @insert_meta_param_description(pad=12) |
| 993 | def map_partitions( |
| 994 | self, |
| 995 | func, |
| 996 | *args, |
| 997 | meta=no_default, |
| 998 | enforce_metadata=True, |
| 999 | transform_divisions=True, |
| 1000 | clear_divisions=False, |
| 1001 | align_dataframes=False, |
| 1002 | parent_meta=None, |
| 1003 | required_columns=None, |
| 1004 | **kwargs, |
| 1005 | ): |
| 1006 | """Apply a Python function to each partition |
| 1007 | |
| 1008 | Parameters |
| 1009 | ---------- |
| 1010 | func : function |
| 1011 | Function applied to each partition. |
| 1012 | args, kwargs : |
| 1013 | Arguments and keywords to pass to the function. Arguments and |
| 1014 | keywords may contain ``FrameBase`` or regular python objects. |
| 1015 | DataFrame-like args (both dask and pandas) must have the same |
| 1016 | number of partitions as ``self`` or comprise a single partition. |
| 1017 | Key-word arguments, Single-partition arguments, and general |
| 1018 | python-object arguments will be broadcasted to all partitions. |
| 1019 | enforce_metadata : bool, default True |
| 1020 | Whether to enforce at runtime that the structure of the DataFrame |
| 1021 | produced by ``func`` actually matches the structure of ``meta``. |
| 1022 | This will rename and reorder columns for each partition, and will |
| 1023 | raise an error if this doesn't work, but it won't raise if dtypes |
| 1024 | don't match. |
| 1025 | transform_divisions : bool, default True |
| 1026 | Whether to apply the function onto the divisions and apply those |
| 1027 | transformed divisions to the output. |
| 1028 | clear_divisions : bool, default False |
| 1029 | Whether divisions should be cleared. If True, `transform_divisions` |
| 1030 | will be ignored. |
| 1031 | required_columns : list or None, default None |
| 1032 | List of columns that ``func`` requires for execution. These columns |
| 1033 | must belong to the first DataFrame argument (in ``args``). If None |
| 1034 | is specified (the default), the query optimizer will assume that |
| 1035 | all input columns are required. |
| 1036 | $META |
| 1037 | |
| 1038 | Examples |
| 1039 | -------- |
| 1040 | Given a DataFrame, Series, or Index, such as: |
| 1041 | |
| 1042 | >>> import pandas as pd |
| 1043 | >>> import dask.dataframe as dd |
| 1044 | >>> df = pd.DataFrame({'x': [1, 2, 3, 4, 5], |
| 1045 | ... 'y': [1., 2., 3., 4., 5.]}) |
| 1046 | >>> ddf = dd.from_pandas(df, npartitions=2) |
| 1047 | |
| 1048 | One can use ``map_partitions`` to apply a function on each partition. |
| 1049 | Extra arguments and keywords can optionally be provided, and will be |
| 1050 | passed to the function after the partition. |