MCPcopy
hub / github.com/dask/dask / from_map

Function from_map

dask/dataframe/dask_expr/_collection.py:5772–5991  ·  view source on GitHub ↗

Create a DataFrame collection from a custom function map. ``from_map`` is the preferred option when reading from data sources that are not natively supported by Dask or if the data source requires custom handling before handing things of to Dask DataFrames. Examples are things like

(
    func,
    *iterables,
    args=None,
    meta=no_default,
    divisions=None,
    label=None,
    enforce_metadata=False,
    **kwargs,
)

Source from the content-addressed store, hash-verified

5770
5771
5772def from_map(
5773 func,
5774 *iterables,
5775 args=None,
5776 meta=no_default,
5777 divisions=None,
5778 label=None,
5779 enforce_metadata=False,
5780 **kwargs,
5781):
5782 """Create a DataFrame collection from a custom function map.
5783
5784 ``from_map`` is the preferred option when reading from data sources
5785 that are not natively supported by Dask or if the data source
5786 requires custom handling before handing things of to Dask DataFrames.
5787 Examples are things like binary files or other unstructured data that
5788 doesn't have an IO connector.
5789
5790 ``from_map`` supports column projection by the optimizer. The optimizer
5791 tries to push column selections into the from_map call if the function
5792 supports a ``columns`` argument.
5793
5794 Parameters
5795 ----------
5796 func : callable
5797 Function used to create each partition. Column projection will be
5798 enabled if the function has a ``columns`` keyword argument.
5799 *iterables : Iterable objects
5800 Iterable objects to map to each output partition. All iterables must
5801 be the same length. This length determines the number of partitions
5802 in the output collection (only one element of each iterable will
5803 be passed to ``func`` for each partition).
5804 args : list or tuple, optional
5805 Positional arguments to broadcast to each output partition. Note
5806 that these arguments will always be passed to ``func`` after the
5807 ``iterables`` positional arguments.
5808 $META
5809 divisions : tuple, str, optional
5810 Partition boundaries along the index.
5811 For tuple, see https://docs.dask.org/en/latest/dataframe-design.html#partitions
5812 For string 'sorted' will compute the delayed values to find index
5813 values. Assumes that the indexes are mutually sorted.
5814 If None, then won't use index information
5815 label : str, optional
5816 String to use as the function-name label in the output
5817 collection-key names.
5818 token : str, optional
5819 String to use as the "token" in the output collection-key names.
5820 enforce_metadata : bool, default True
5821 Whether to enforce at runtime that the structure of the DataFrame
5822 produced by ``func`` actually matches the structure of ``meta``.
5823 This will rename and reorder columns for each partition,
5824 and will raise an error if this doesn't work,
5825 but it won't raise if dtypes don't match.
5826 **kwargs:
5827 Key-word arguments to broadcast to each output partition. These
5828 same arguments will be passed to ``func`` for every output partition.
5829

Callers 5

test_from_mapFunction · 0.90
with_specFunction · 0.90

Calls 8

new_collectionFunction · 0.90
FromMapProjectableClass · 0.90
FromMapClass · 0.90
pyarrow_strings_enabledFunction · 0.90
setClass · 0.85
popMethod · 0.80
addMethod · 0.45
getMethod · 0.45

Used in the wild real call sites across dependent graphs

searching dependent graphs…