Parallel version of pandas.DataFrame.apply This mimics the pandas version except for the following: 1. Only ``axis=1`` is supported (and must be specified explicitly). 2. The user should provide output metadata via the `meta` keyword. Parameters ---------
(self, function, *args, meta=no_default, axis=0, **kwargs)
| 3182 | |
| 3183 | @insert_meta_param_description(pad=12) |
| 3184 | def apply(self, function, *args, meta=no_default, axis=0, **kwargs): |
| 3185 | """Parallel version of pandas.DataFrame.apply |
| 3186 | |
| 3187 | This mimics the pandas version except for the following: |
| 3188 | |
| 3189 | 1. Only ``axis=1`` is supported (and must be specified explicitly). |
| 3190 | 2. The user should provide output metadata via the `meta` keyword. |
| 3191 | |
| 3192 | Parameters |
| 3193 | ---------- |
| 3194 | func : function |
| 3195 | Function to apply to each column/row |
| 3196 | axis : {0 or 'index', 1 or 'columns'}, default 0 |
| 3197 | - 0 or 'index': apply function to each column (NOT SUPPORTED) |
| 3198 | - 1 or 'columns': apply function to each row |
| 3199 | $META |
| 3200 | args : tuple |
| 3201 | Positional arguments to pass to function in addition to the array/series |
| 3202 | |
| 3203 | Additional keyword arguments will be passed as keywords to the function |
| 3204 | |
| 3205 | Returns |
| 3206 | ------- |
| 3207 | applied : Series or DataFrame |
| 3208 | |
| 3209 | Examples |
| 3210 | -------- |
| 3211 | >>> import pandas as pd |
| 3212 | >>> import dask.dataframe as dd |
| 3213 | >>> df = pd.DataFrame({'x': [1, 2, 3, 4, 5], |
| 3214 | ... 'y': [1., 2., 3., 4., 5.]}) |
| 3215 | >>> ddf = dd.from_pandas(df, npartitions=2) |
| 3216 | |
| 3217 | Apply a function to row-wise passing in extra arguments in ``args`` and |
| 3218 | ``kwargs``: |
| 3219 | |
| 3220 | >>> def myadd(row, a, b=1): |
| 3221 | ... return row.sum() + a + b |
| 3222 | >>> res = ddf.apply(myadd, axis=1, args=(2,), b=1.5) # doctest: +SKIP |
| 3223 | |
| 3224 | By default, dask tries to infer the output metadata by running your |
| 3225 | provided function on some fake data. This works well in many cases, but |
| 3226 | can sometimes be expensive, or even fail. To avoid this, you can |
| 3227 | manually specify the output metadata with the ``meta`` keyword. This |
| 3228 | can be specified in many forms, for more information see |
| 3229 | ``dask.dataframe.utils.make_meta``. |
| 3230 | |
| 3231 | Here we specify the output is a Series with name ``'x'``, and dtype |
| 3232 | ``float64``: |
| 3233 | |
| 3234 | >>> res = ddf.apply(myadd, axis=1, args=(2,), b=1.5, meta=('x', 'f8')) |
| 3235 | |
| 3236 | In the case where the metadata doesn't change, you can also pass in |
| 3237 | the object itself directly: |
| 3238 | |
| 3239 | >>> res = ddf.apply(lambda row: row + 1, axis=1, meta=ddf) |
| 3240 | |
| 3241 | See Also |