Create Dask DataFrame from many Dask Delayed objects .. warning:: ``from_delayed`` should only be used if the objects that create the data are complex and cannot be easily represented as a single function in an embarrassingly parallel fashion. ``from_map`` is re
(
dfs: Delayed | distributed.Future | Collection[Delayed | distributed.Future],
meta=None,
divisions: tuple | None = None,
prefix: str | None = None,
verify_meta: bool = True,
)
| 75 | |
| 76 | |
| 77 | def from_delayed( |
| 78 | dfs: Delayed | distributed.Future | Collection[Delayed | distributed.Future], |
| 79 | meta=None, |
| 80 | divisions: tuple | None = None, |
| 81 | prefix: str | None = None, |
| 82 | verify_meta: bool = True, |
| 83 | ): |
| 84 | """Create Dask DataFrame from many Dask Delayed objects |
| 85 | |
| 86 | .. warning:: |
| 87 | ``from_delayed`` should only be used if the objects that create |
| 88 | the data are complex and cannot be easily represented as a single |
| 89 | function in an embarrassingly parallel fashion. |
| 90 | |
| 91 | ``from_map`` is recommended if the query can be expressed as a single |
| 92 | function like: |
| 93 | |
| 94 | def read_xml(path): |
| 95 | return pd.read_xml(path) |
| 96 | |
| 97 | ddf = dd.from_map(read_xml, paths) |
| 98 | |
| 99 | ``from_delayed`` might be deprecated in the future. |
| 100 | |
| 101 | Parameters |
| 102 | ---------- |
| 103 | dfs : |
| 104 | A ``dask.delayed.Delayed``, a ``distributed.Future``, or an iterable of either |
| 105 | of these objects, e.g. returned by ``client.submit``. These comprise the |
| 106 | individual partitions of the resulting dataframe. |
| 107 | If a single object is provided (not an iterable), then the resulting dataframe |
| 108 | will have only one partition. |
| 109 | $META |
| 110 | divisions : |
| 111 | Partition boundaries along the index. |
| 112 | For tuple, see https://docs.dask.org/en/latest/dataframe-design.html#partitions |
| 113 | If None, then won't use index information |
| 114 | prefix : |
| 115 | Prefix to prepend to the keys. |
| 116 | verify_meta : |
| 117 | If True check that the partitions have consistent metadata, defaults to True. |
| 118 | """ |
| 119 | if isinstance(dfs, Delayed) or hasattr(dfs, "key"): |
| 120 | dfs = [dfs] |
| 121 | |
| 122 | if len(dfs) == 0: |
| 123 | raise TypeError("Must supply at least one delayed object") |
| 124 | |
| 125 | if meta is None: |
| 126 | meta = delayed(make_meta)(dfs[0]).compute() # type: ignore[index] |
| 127 | |
| 128 | if divisions == "sorted": |
| 129 | raise NotImplementedError( |
| 130 | "divisions='sorted' not supported, please calculate the divisions " |
| 131 | "yourself." |
| 132 | ) |
| 133 | elif divisions is not None: |
| 134 | divs = list(divisions) |
searching dependent graphs…