MCPcopy
hub / github.com/ray-project/ray / test_lance_read_basic

Function test_lance_read_basic

python/ray/data/tests/datasource/test_lance.py:47–97  ·  view source on GitHub ↗
(fs, data_path, batch_size)

Source from the content-addressed store, hash-verified

45 [None, 100],
46)
47def test_lance_read_basic(fs, data_path, batch_size):
48 df1 = pa.table({"one": [2, 1, 3, 4, 6, 5], "two": ["b", "a", "c", "e", "g", "f"]})
49 setup_data_path = _unwrap_protocol(data_path)
50 path = os.path.join(setup_data_path, "test.lance")
51 lance.write_dataset(df1, path)
52
53 ds_lance = lance.dataset(path)
54 assert ds_lance is not None
55 df2 = pa.table(
56 {
57 "one": [1, 2, 3, 4, 5, 6],
58 "three": [4, 5, 8, 9, 12, 13],
59 "four": ["u", "v", "w", "x", "y", "z"],
60 }
61 )
62 ds_lance.merge(df2, "one")
63
64 if batch_size is None:
65 ds = ray.data.read_lance(path)
66 else:
67 ds = ray.data.read_lance(path, scanner_options={"batch_size": batch_size})
68
69 # Test metadata-only ops.
70 assert ds.count() == 6
71 assert ds.schema() == Schema(
72 pa.schema(
73 {
74 "one": pa.int64(),
75 "two": pa.string(),
76 "three": pa.int64(),
77 "four": pa.string(),
78 }
79 )
80 )
81
82 # Test read.
83 values = [[s["one"], s["two"]] for s in ds.take_all()]
84 assert sorted(values) == [
85 [1, "a"],
86 [2, "b"],
87 [3, "c"],
88 [4, "e"],
89 [5, "f"],
90 [6, "g"],
91 ]
92
93 # Test column projection.
94 ds = ray.data.read_lance(path, columns=["one"])
95 values = [s["one"] for s in ds.take_all()]
96 assert sorted(values) == [1, 2, 3, 4, 5, 6]
97 assert ds.schema().names == ["one"]
98
99
100@pytest.mark.parametrize("data_path", [lazy_fixture("local_path")])

Callers

nothing calls this directly

Calls 8

_unwrap_protocolFunction · 0.90
SchemaClass · 0.90
tableMethod · 0.80
take_allMethod · 0.80
joinMethod · 0.45
mergeMethod · 0.45
countMethod · 0.45
schemaMethod · 0.45

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…