Function read_bytes

dask/bytes/core.py:14–187 · view source on GitHub ↗

Given a path or paths, return delayed objects that read from those paths. The path may be a filename like ``'2015-01-01.csv'`` or a globstring like ``'2015-*-*.csv'``. The path may be preceded by a protocol, like ``s3://`` or ``hdfs://`` if those libraries are installed. This

(
    urlpath,
    delimiter=None,
    not_zero=False,
    blocksize="128 MiB",
    sample="10 kiB",
    compression=None,
    include_path=False,
    **kwargs,
)

Source from the content-addressed store, hash-verified

12
13
14	def read_bytes(
15	urlpath,
16	delimiter=None,
17	not_zero=False,
18	blocksize="128 MiB",
19	sample="10 kiB",
20	compression=None,
21	include_path=False,
22	**kwargs,
23	):
24	"""Given a path or paths, return delayed objects that read from those paths.
25
26	The path may be a filename like ``'2015-01-01.csv'`` or a globstring
27	like ``'2015--.csv'``.
28
29	The path may be preceded by a protocol, like ``s3://`` or ``hdfs://`` if
30	those libraries are installed.
31
32	This cleanly breaks data by a delimiter if given, so that block boundaries
33	start directly after a delimiter and end on the delimiter.
34
35	Parameters
36	----------
37	urlpath : string or list
38	Absolute or relative filepath(s). Prefix with a protocol like ``s3://``
39	to read from alternative filesystems. To read from multiple files you
40	can pass a globstring or a list of paths, with the caveat that they
41	must all have the same protocol.
42	delimiter : bytes
43	An optional delimiter, like ``b'\\n'`` on which to split blocks of
44	bytes.
45	not_zero : bool
46	Force seek of start-of-file delimiter, discarding header.
47	blocksize : int, str
48	Chunk size in bytes, defaults to "128 MiB"
49	compression : string or None
50	String like 'gzip' or 'xz'. Must support efficient random access.
51	sample : int, string, or boolean
52	Whether or not to return a header sample.
53	Values can be ``False`` for "no sample requested"
54	Or an integer or string value like ``2**20`` or ``"1 MiB"``
55	include_path : bool
56	Whether or not to include the path with the bytes representing a particular file.
57	Default is False.
58	**kwargs : dict
59	Extra options that make sense to a particular storage connection, e.g.
60	host, port, username, password, etc.
61
62	Examples
63	--------
64	>>> sample, blocks = read_bytes('2015--.csv', delimiter=b'\\n') # doctest: +SKIP
65	>>> sample, blocks = read_bytes('s3://bucket/2015--.csv', delimiter=b'\\n') # doctest: +SKIP
66	>>> sample, paths, blocks = read_bytes('2015--.csv', include_path=True) # doctest: +SKIP
67
68	Returns
69	-------
70	sample : bytes
71	The sample header

Callers 15

test_unordered_urlpath_errorsFunction · 0.90

test_read_bytesFunction · 0.90

test_read_bytes_sample_delimiterFunction · 0.90

test_parse_sample_bytesFunction · 0.90

test_read_bytes_no_sampleFunction · 0.90

test_read_bytes_blocksize_noneFunction · 0.90

test_read_bytes_blocksize_typesFunction · 0.90

test_read_bytes_blocksize_float_errsFunction · 0.90

test_read_bytes_include_pathFunction · 0.90

test_with_urlsFunction · 0.90

test_with_pathsFunction · 0.90

test_read_bytes_blockFunction · 0.90

Calls 6

parse_bytesFunction · 0.90

is_integerFunction · 0.90

delayedFunction · 0.90

infoMethod · 0.80

splitMethod · 0.80

tokenizeFunction · 0.50

Tested by 15

test_unordered_urlpath_errorsFunction · 0.72

test_read_bytesFunction · 0.72

test_read_bytes_sample_delimiterFunction · 0.72

test_parse_sample_bytesFunction · 0.72

test_read_bytes_no_sampleFunction · 0.72

test_read_bytes_blocksize_noneFunction · 0.72

test_read_bytes_blocksize_typesFunction · 0.72

test_read_bytes_blocksize_float_errsFunction · 0.72

test_read_bytes_include_pathFunction · 0.72

test_with_urlsFunction · 0.72

test_with_pathsFunction · 0.72

test_read_bytes_blockFunction · 0.72

Used in the wild real call sites across dependent graphs

searching dependent graphs…