hub / github.com/dask/dask / to_textfiles

Function to_textfiles

dask/bag/core.py:191–281 · view source on GitHub ↗

Write dask Bag to disk, one filename per partition, one line per element. **Paths**: This will create one file for each partition in your bag. You can specify the filenames in a variety of ways. Use a globstring >>> b.to_textfiles('/path/to/data/*.json.gz') # doctest: +SKIP

(
    b,
    path,
    name_function=None,
    compression="infer",
    encoding=system_encoding,
    compute=True,
    storage_options=None,
    last_endline=False,
    **kwargs,
)

Source from the content-addressed store, hash-verified

189
190
191	def to_textfiles(
192	b,
193	path,
194	name_function=None,
195	compression="infer",
196	encoding=system_encoding,
197	compute=True,
198	storage_options=None,
199	last_endline=False,
200	**kwargs,
201	):
202	"""Write dask Bag to disk, one filename per partition, one line per element.
203
204	Paths: This will create one file for each partition in your bag. You
205	can specify the filenames in a variety of ways.
206
207	Use a globstring
208
209	>>> b.to_textfiles('/path/to/data/*.json.gz') # doctest: +SKIP
210
211	The * will be replaced by the increasing sequence 1, 2, ...
212
213	::
214
215	/path/to/data/0.json.gz
216	/path/to/data/1.json.gz
217
218	Use a globstring and a ``name_function=`` keyword argument. The
219	name_function function should expect an integer and produce a string.
220	Strings produced by name_function must preserve the order of their
221	respective partition indices.
222
223	>>> from datetime import date, timedelta
224	>>> def name(i):
225	... return str(date(2015, 1, 1) + i * timedelta(days=1))
226
227	>>> name(0)
228	'2015-01-01'
229	>>> name(15)
230	'2015-01-16'
231
232	>>> b.to_textfiles('/path/to/data/*.json.gz', name_function=name) # doctest: +SKIP
233
234	::
235
236	/path/to/data/2015-01-01.json.gz
237	/path/to/data/2015-01-02.json.gz
238	...
239
240	You can also provide an explicit list of paths.
241
242	>>> paths = ['/path/to/data/alice.json.gz', '/path/to/data/bob.json.gz', ...] # doctest: +SKIP
243	>>> b.to_textfiles(paths) # doctest: +SKIP
244
245	Compression: Filenames with extensions corresponding to known
246	compression algorithms (gz, bz2) will be compressed accordingly.
247
248	Bag Contents: The bag calling ``to_textfiles`` must be a bag of

Callers 1

to_textfilesMethod · 0.85

Calls 3

from_collectionsMethod · 0.80

computeMethod · 0.45

to_delayedMethod · 0.45

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…