Given a path or paths, return delayed objects that read from those paths. The path may be a filename like ``'2015-01-01.csv'`` or a globstring like ``'2015-*-*.csv'``. The path may be preceded by a protocol, like ``s3://`` or ``hdfs://`` if those libraries are installed. This
(
urlpath,
delimiter=None,
not_zero=False,
blocksize="128 MiB",
sample="10 kiB",
compression=None,
include_path=False,
**kwargs,
)
| 12 | |
| 13 | |
| 14 | def read_bytes( |
| 15 | urlpath, |
| 16 | delimiter=None, |
| 17 | not_zero=False, |
| 18 | blocksize="128 MiB", |
| 19 | sample="10 kiB", |
| 20 | compression=None, |
| 21 | include_path=False, |
| 22 | **kwargs, |
| 23 | ): |
| 24 | """Given a path or paths, return delayed objects that read from those paths. |
| 25 | |
| 26 | The path may be a filename like ``'2015-01-01.csv'`` or a globstring |
| 27 | like ``'2015-*-*.csv'``. |
| 28 | |
| 29 | The path may be preceded by a protocol, like ``s3://`` or ``hdfs://`` if |
| 30 | those libraries are installed. |
| 31 | |
| 32 | This cleanly breaks data by a delimiter if given, so that block boundaries |
| 33 | start directly after a delimiter and end on the delimiter. |
| 34 | |
| 35 | Parameters |
| 36 | ---------- |
| 37 | urlpath : string or list |
| 38 | Absolute or relative filepath(s). Prefix with a protocol like ``s3://`` |
| 39 | to read from alternative filesystems. To read from multiple files you |
| 40 | can pass a globstring or a list of paths, with the caveat that they |
| 41 | must all have the same protocol. |
| 42 | delimiter : bytes |
| 43 | An optional delimiter, like ``b'\\n'`` on which to split blocks of |
| 44 | bytes. |
| 45 | not_zero : bool |
| 46 | Force seek of start-of-file delimiter, discarding header. |
| 47 | blocksize : int, str |
| 48 | Chunk size in bytes, defaults to "128 MiB" |
| 49 | compression : string or None |
| 50 | String like 'gzip' or 'xz'. Must support efficient random access. |
| 51 | sample : int, string, or boolean |
| 52 | Whether or not to return a header sample. |
| 53 | Values can be ``False`` for "no sample requested" |
| 54 | Or an integer or string value like ``2**20`` or ``"1 MiB"`` |
| 55 | include_path : bool |
| 56 | Whether or not to include the path with the bytes representing a particular file. |
| 57 | Default is False. |
| 58 | **kwargs : dict |
| 59 | Extra options that make sense to a particular storage connection, e.g. |
| 60 | host, port, username, password, etc. |
| 61 | |
| 62 | Examples |
| 63 | -------- |
| 64 | >>> sample, blocks = read_bytes('2015-*-*.csv', delimiter=b'\\n') # doctest: +SKIP |
| 65 | >>> sample, blocks = read_bytes('s3://bucket/2015-*-*.csv', delimiter=b'\\n') # doctest: +SKIP |
| 66 | >>> sample, paths, blocks = read_bytes('2015-*-*.csv', include_path=True) # doctest: +SKIP |
| 67 | |
| 68 | Returns |
| 69 | ------- |
| 70 | sample : bytes |
| 71 | The sample header |
searching dependent graphs…