Create a DataFrame collection from a custom function map. ``from_map`` is the preferred option when reading from data sources that are not natively supported by Dask or if the data source requires custom handling before handing things of to Dask DataFrames. Examples are things like
(
func,
*iterables,
args=None,
meta=no_default,
divisions=None,
label=None,
enforce_metadata=False,
**kwargs,
)
| 5770 | |
| 5771 | |
| 5772 | def from_map( |
| 5773 | func, |
| 5774 | *iterables, |
| 5775 | args=None, |
| 5776 | meta=no_default, |
| 5777 | divisions=None, |
| 5778 | label=None, |
| 5779 | enforce_metadata=False, |
| 5780 | **kwargs, |
| 5781 | ): |
| 5782 | """Create a DataFrame collection from a custom function map. |
| 5783 | |
| 5784 | ``from_map`` is the preferred option when reading from data sources |
| 5785 | that are not natively supported by Dask or if the data source |
| 5786 | requires custom handling before handing things of to Dask DataFrames. |
| 5787 | Examples are things like binary files or other unstructured data that |
| 5788 | doesn't have an IO connector. |
| 5789 | |
| 5790 | ``from_map`` supports column projection by the optimizer. The optimizer |
| 5791 | tries to push column selections into the from_map call if the function |
| 5792 | supports a ``columns`` argument. |
| 5793 | |
| 5794 | Parameters |
| 5795 | ---------- |
| 5796 | func : callable |
| 5797 | Function used to create each partition. Column projection will be |
| 5798 | enabled if the function has a ``columns`` keyword argument. |
| 5799 | *iterables : Iterable objects |
| 5800 | Iterable objects to map to each output partition. All iterables must |
| 5801 | be the same length. This length determines the number of partitions |
| 5802 | in the output collection (only one element of each iterable will |
| 5803 | be passed to ``func`` for each partition). |
| 5804 | args : list or tuple, optional |
| 5805 | Positional arguments to broadcast to each output partition. Note |
| 5806 | that these arguments will always be passed to ``func`` after the |
| 5807 | ``iterables`` positional arguments. |
| 5808 | $META |
| 5809 | divisions : tuple, str, optional |
| 5810 | Partition boundaries along the index. |
| 5811 | For tuple, see https://docs.dask.org/en/latest/dataframe-design.html#partitions |
| 5812 | For string 'sorted' will compute the delayed values to find index |
| 5813 | values. Assumes that the indexes are mutually sorted. |
| 5814 | If None, then won't use index information |
| 5815 | label : str, optional |
| 5816 | String to use as the function-name label in the output |
| 5817 | collection-key names. |
| 5818 | token : str, optional |
| 5819 | String to use as the "token" in the output collection-key names. |
| 5820 | enforce_metadata : bool, default True |
| 5821 | Whether to enforce at runtime that the structure of the DataFrame |
| 5822 | produced by ``func`` actually matches the structure of ``meta``. |
| 5823 | This will rename and reorder columns for each partition, |
| 5824 | and will raise an error if this doesn't work, |
| 5825 | but it won't raise if dtypes don't match. |
| 5826 | **kwargs: |
| 5827 | Key-word arguments to broadcast to each output partition. These |
| 5828 | same arguments will be passed to ``func`` for every output partition. |
| 5829 |
searching dependent graphs…