MCPcopy
hub / github.com/pydata/xarray / test_open_mfdataset_manyfiles

Function test_open_mfdataset_manyfiles

xarray/tests/test_backends.py:5525–5555  ·  view source on GitHub ↗
(
    readengine, nfiles, parallel, chunks, file_cache_maxsize
)

Source from the content-addressed store, hash-verified

5523 reason="Flaky test which can cause the worker to crash (so don't xfail). Very open to contributions fixing this"
5524)
5525def test_open_mfdataset_manyfiles(
5526 readengine, nfiles, parallel, chunks, file_cache_maxsize
5527):
5528 # skip certain combinations
5529 skip_if_not_engine(readengine)
5530
5531 randdata = np.random.randn(nfiles)
5532 original = Dataset({"foo": ("x", randdata)})
5533 # test standard open_mfdataset approach with too many files
5534 with create_tmp_files(nfiles) as tmpfiles:
5535 # split into multiple sets of temp files
5536 for ii in original.x.values:
5537 subds = original.isel(x=slice(ii, ii + 1))
5538 if readengine != "zarr":
5539 subds.to_netcdf(tmpfiles[ii], engine=readengine)
5540 else: # if writeengine == "zarr":
5541 subds.to_zarr(store=tmpfiles[ii])
5542
5543 # check that calculation on opened datasets works properly
5544 with open_mfdataset(
5545 tmpfiles,
5546 combine="nested",
5547 concat_dim="x",
5548 engine=readengine,
5549 parallel=parallel,
5550 chunks=chunks if (not chunks and readengine != "zarr") else "auto",
5551 ) as actual:
5552 # check that using open_mfdataset returns dask arrays for variables
5553 assert isinstance(actual["foo"].data, dask_array_type)
5554
5555 assert_identical(original, actual)
5556
5557
5558@requires_netCDF4

Callers

nothing calls this directly

Calls 8

iselMethod · 0.95
DatasetClass · 0.90
open_mfdatasetFunction · 0.90
assert_identicalFunction · 0.90
skip_if_not_engineFunction · 0.85
create_tmp_filesFunction · 0.85
to_netcdfMethod · 0.45
to_zarrMethod · 0.45

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…