MCPcopy
hub / github.com/huggingface/datasets / test_dataset_estimate_nbytes

Function test_dataset_estimate_nbytes

tests/test_arrow_dataset.py:4583–4597  ·  view source on GitHub ↗
()

Source from the content-addressed store, hash-verified

4581
4582
4583def test_dataset_estimate_nbytes():
4584 ds = Dataset.from_dict({"a": ["0" * 100] * 100})
4585 assert 0.9 * ds._estimate_nbytes() < 100 * 100, "must be smaller than full dataset size"
4586
4587 ds = Dataset.from_dict({"a": ["0" * 100] * 100}).select([0])
4588 assert 0.9 * ds._estimate_nbytes() < 100 * 100, "must be smaller than one chunk"
4589
4590 ds = Dataset.from_dict({"a": ["0" * 100] * 100})
4591 ds = concatenate_datasets([ds] * 100)
4592 assert 0.9 * ds._estimate_nbytes() < 100 * 100 * 100, "must be smaller than full dataset size"
4593 assert 1.1 * ds._estimate_nbytes() > 100 * 100 * 100, "must be bigger than full dataset size"
4594
4595 ds = Dataset.from_dict({"a": ["0" * 100] * 100})
4596 ds = concatenate_datasets([ds] * 100).select([0])
4597 assert 0.9 * ds._estimate_nbytes() < 100 * 100, "must be smaller than one chunk"
4598
4599
4600def test_dataset_to_iterable_dataset(dataset: Dataset):

Callers

nothing calls this directly

Calls 4

concatenate_datasetsFunction · 0.90
_estimate_nbytesMethod · 0.80
from_dictMethod · 0.45
selectMethod · 0.45

Tested by

no test coverage detected