MCPcopy
hub / github.com/huggingface/datasets / test_concatenate_datasets

Function test_concatenate_datasets

tests/test_arrow_dataset.py:3644–3658  ·  view source on GitHub ↗
(dataset_type, axis, expected_shape, dataset_dict, arrow_path)

Source from the content-addressed store, hash-verified

3642@pytest.mark.parametrize("dataset_type", ["in_memory", "memory_mapped", "mixed"])
3643@pytest.mark.parametrize("axis, expected_shape", [(0, (4, 3)), (1, (2, 6))])
3644def test_concatenate_datasets(dataset_type, axis, expected_shape, dataset_dict, arrow_path):
3645 table = {
3646 "in_memory": InMemoryTable.from_pydict(dataset_dict),
3647 "memory_mapped": MemoryMappedTable.from_file(arrow_path),
3648 }
3649 tables = [
3650 table[dataset_type if dataset_type != "mixed" else "memory_mapped"].slice(0, 2), # shape = (2, 3)
3651 table[dataset_type if dataset_type != "mixed" else "in_memory"].slice(2, 4), # shape = (2, 3)
3652 ]
3653 if axis == 1: # don't duplicate columns
3654 tables[1] = tables[1].rename_columns([col + "_bis" for col in tables[1].column_names])
3655 datasets = [Dataset(table) for table in tables]
3656 dataset = concatenate_datasets(datasets, axis=axis)
3657 assert dataset.shape == expected_shape
3658 assert_arrow_metadata_are_synced_with_dataset_features(dataset)
3659
3660
3661def test_concatenate_datasets_new_columns():

Callers

nothing calls this directly

Calls 7

DatasetClass · 0.90
concatenate_datasetsFunction · 0.90
from_pydictMethod · 0.80
from_fileMethod · 0.45
sliceMethod · 0.45
rename_columnsMethod · 0.45

Tested by

no test coverage detected