Write dask Bag to disk, one filename per partition, one line per element. **Paths**: This will create one file for each partition in your bag. You can specify the filenames in a variety of ways. Use a globstring >>> b.to_textfiles('/path/to/data/*.json.gz') # doctest: +SKIP
(
b,
path,
name_function=None,
compression="infer",
encoding=system_encoding,
compute=True,
storage_options=None,
last_endline=False,
**kwargs,
)
| 189 | |
| 190 | |
| 191 | def to_textfiles( |
| 192 | b, |
| 193 | path, |
| 194 | name_function=None, |
| 195 | compression="infer", |
| 196 | encoding=system_encoding, |
| 197 | compute=True, |
| 198 | storage_options=None, |
| 199 | last_endline=False, |
| 200 | **kwargs, |
| 201 | ): |
| 202 | """Write dask Bag to disk, one filename per partition, one line per element. |
| 203 | |
| 204 | **Paths**: This will create one file for each partition in your bag. You |
| 205 | can specify the filenames in a variety of ways. |
| 206 | |
| 207 | Use a globstring |
| 208 | |
| 209 | >>> b.to_textfiles('/path/to/data/*.json.gz') # doctest: +SKIP |
| 210 | |
| 211 | The * will be replaced by the increasing sequence 1, 2, ... |
| 212 | |
| 213 | :: |
| 214 | |
| 215 | /path/to/data/0.json.gz |
| 216 | /path/to/data/1.json.gz |
| 217 | |
| 218 | Use a globstring and a ``name_function=`` keyword argument. The |
| 219 | name_function function should expect an integer and produce a string. |
| 220 | Strings produced by name_function must preserve the order of their |
| 221 | respective partition indices. |
| 222 | |
| 223 | >>> from datetime import date, timedelta |
| 224 | >>> def name(i): |
| 225 | ... return str(date(2015, 1, 1) + i * timedelta(days=1)) |
| 226 | |
| 227 | >>> name(0) |
| 228 | '2015-01-01' |
| 229 | >>> name(15) |
| 230 | '2015-01-16' |
| 231 | |
| 232 | >>> b.to_textfiles('/path/to/data/*.json.gz', name_function=name) # doctest: +SKIP |
| 233 | |
| 234 | :: |
| 235 | |
| 236 | /path/to/data/2015-01-01.json.gz |
| 237 | /path/to/data/2015-01-02.json.gz |
| 238 | ... |
| 239 | |
| 240 | You can also provide an explicit list of paths. |
| 241 | |
| 242 | >>> paths = ['/path/to/data/alice.json.gz', '/path/to/data/bob.json.gz', ...] # doctest: +SKIP |
| 243 | >>> b.to_textfiles(paths) # doctest: +SKIP |
| 244 | |
| 245 | **Compression**: Filenames with extensions corresponding to known |
| 246 | compression algorithms (gz, bz2) will be compressed accordingly. |
| 247 | |
| 248 | **Bag Contents**: The bag calling ``to_textfiles`` must be a bag of |
no test coverage detected
searching dependent graphs…