MCPcopy
hub / github.com/dask/dask / to_textfiles

Function to_textfiles

dask/bag/core.py:191–281  ·  view source on GitHub ↗

Write dask Bag to disk, one filename per partition, one line per element. **Paths**: This will create one file for each partition in your bag. You can specify the filenames in a variety of ways. Use a globstring >>> b.to_textfiles('/path/to/data/*.json.gz') # doctest: +SKIP

(
    b,
    path,
    name_function=None,
    compression="infer",
    encoding=system_encoding,
    compute=True,
    storage_options=None,
    last_endline=False,
    **kwargs,
)

Source from the content-addressed store, hash-verified

189
190
191def to_textfiles(
192 b,
193 path,
194 name_function=None,
195 compression="infer",
196 encoding=system_encoding,
197 compute=True,
198 storage_options=None,
199 last_endline=False,
200 **kwargs,
201):
202 """Write dask Bag to disk, one filename per partition, one line per element.
203
204 **Paths**: This will create one file for each partition in your bag. You
205 can specify the filenames in a variety of ways.
206
207 Use a globstring
208
209 >>> b.to_textfiles('/path/to/data/*.json.gz') # doctest: +SKIP
210
211 The * will be replaced by the increasing sequence 1, 2, ...
212
213 ::
214
215 /path/to/data/0.json.gz
216 /path/to/data/1.json.gz
217
218 Use a globstring and a ``name_function=`` keyword argument. The
219 name_function function should expect an integer and produce a string.
220 Strings produced by name_function must preserve the order of their
221 respective partition indices.
222
223 >>> from datetime import date, timedelta
224 >>> def name(i):
225 ... return str(date(2015, 1, 1) + i * timedelta(days=1))
226
227 >>> name(0)
228 '2015-01-01'
229 >>> name(15)
230 '2015-01-16'
231
232 >>> b.to_textfiles('/path/to/data/*.json.gz', name_function=name) # doctest: +SKIP
233
234 ::
235
236 /path/to/data/2015-01-01.json.gz
237 /path/to/data/2015-01-02.json.gz
238 ...
239
240 You can also provide an explicit list of paths.
241
242 >>> paths = ['/path/to/data/alice.json.gz', '/path/to/data/bob.json.gz', ...] # doctest: +SKIP
243 >>> b.to_textfiles(paths) # doctest: +SKIP
244
245 **Compression**: Filenames with extensions corresponding to known
246 compression algorithms (gz, bz2) will be compressed accordingly.
247
248 **Bag Contents**: The bag calling ``to_textfiles`` must be a bag of

Callers 1

to_textfilesMethod · 0.85

Calls 3

from_collectionsMethod · 0.80
computeMethod · 0.45
to_delayedMethod · 0.45

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…