MCPcopy
hub / github.com/dask/dask / from_delayed

Function from_delayed

dask/dataframe/dask_expr/io/_delayed.py:77–164  ·  view source on GitHub ↗

Create Dask DataFrame from many Dask Delayed objects .. warning:: ``from_delayed`` should only be used if the objects that create the data are complex and cannot be easily represented as a single function in an embarrassingly parallel fashion. ``from_map`` is re

(
    dfs: Delayed | distributed.Future | Collection[Delayed | distributed.Future],
    meta=None,
    divisions: tuple | None = None,
    prefix: str | None = None,
    verify_meta: bool = True,
)

Source from the content-addressed store, hash-verified

75
76
77def from_delayed(
78 dfs: Delayed | distributed.Future | Collection[Delayed | distributed.Future],
79 meta=None,
80 divisions: tuple | None = None,
81 prefix: str | None = None,
82 verify_meta: bool = True,
83):
84 """Create Dask DataFrame from many Dask Delayed objects
85
86 .. warning::
87 ``from_delayed`` should only be used if the objects that create
88 the data are complex and cannot be easily represented as a single
89 function in an embarrassingly parallel fashion.
90
91 ``from_map`` is recommended if the query can be expressed as a single
92 function like:
93
94 def read_xml(path):
95 return pd.read_xml(path)
96
97 ddf = dd.from_map(read_xml, paths)
98
99 ``from_delayed`` might be deprecated in the future.
100
101 Parameters
102 ----------
103 dfs :
104 A ``dask.delayed.Delayed``, a ``distributed.Future``, or an iterable of either
105 of these objects, e.g. returned by ``client.submit``. These comprise the
106 individual partitions of the resulting dataframe.
107 If a single object is provided (not an iterable), then the resulting dataframe
108 will have only one partition.
109 $META
110 divisions :
111 Partition boundaries along the index.
112 For tuple, see https://docs.dask.org/en/latest/dataframe-design.html#partitions
113 If None, then won't use index information
114 prefix :
115 Prefix to prepend to the keys.
116 verify_meta :
117 If True check that the partitions have consistent metadata, defaults to True.
118 """
119 if isinstance(dfs, Delayed) or hasattr(dfs, "key"):
120 dfs = [dfs]
121
122 if len(dfs) == 0:
123 raise TypeError("Must supply at least one delayed object")
124
125 if meta is None:
126 meta = delayed(make_meta)(dfs[0]).compute() # type: ignore[index]
127
128 if divisions == "sorted":
129 raise NotImplementedError(
130 "divisions='sorted' not supported, please calculate the divisions "
131 "yourself."
132 )
133 elif divisions is not None:
134 divs = list(divisions)

Callers 5

test_from_delayedFunction · 0.90
test_from_delayed_daskFunction · 0.90
test_from_delayed_fusionFunction · 0.90
test_mismatching_metaFunction · 0.90

Calls 9

delayedFunction · 0.90
DelayedsExprClass · 0.90
make_metaFunction · 0.90
pyarrow_strings_enabledFunction · 0.90
new_collectionFunction · 0.90
FromDelayedClass · 0.85
anyFunction · 0.85
computeMethod · 0.45

Tested by 5

test_from_delayedFunction · 0.72
test_from_delayed_daskFunction · 0.72
test_from_delayed_fusionFunction · 0.72
test_mismatching_metaFunction · 0.72

Used in the wild real call sites across dependent graphs

searching dependent graphs…