Convert a pandas.DataFrame into an xarray.Dataset Each column will be converted into an independent variable in the Dataset. If the dataframe's index is a MultiIndex, it will be expanded into a tensor product of one-dimensional indices (filling in missing values with
(cls, dataframe: pd.DataFrame, sparse: bool = False)
| 7400 | |
| 7401 | @classmethod |
| 7402 | def from_dataframe(cls, dataframe: pd.DataFrame, sparse: bool = False) -> Self: |
| 7403 | """Convert a pandas.DataFrame into an xarray.Dataset |
| 7404 | |
| 7405 | Each column will be converted into an independent variable in the |
| 7406 | Dataset. If the dataframe's index is a MultiIndex, it will be expanded |
| 7407 | into a tensor product of one-dimensional indices (filling in missing |
| 7408 | values with NaN). If you rather preserve the MultiIndex use |
| 7409 | `xr.Dataset(df)`. This method will produce a Dataset very similar to |
| 7410 | that on which the 'to_dataframe' method was called, except with |
| 7411 | possibly redundant dimensions (since all dataset variables will have |
| 7412 | the same dimensionality). |
| 7413 | |
| 7414 | Parameters |
| 7415 | ---------- |
| 7416 | dataframe : DataFrame |
| 7417 | DataFrame from which to copy data and indices. |
| 7418 | sparse : bool, default: False |
| 7419 | If true, create a sparse arrays instead of dense numpy arrays. This |
| 7420 | can potentially save a large amount of memory if the DataFrame has |
| 7421 | a MultiIndex. Requires the sparse package (sparse.pydata.org). |
| 7422 | |
| 7423 | Returns |
| 7424 | ------- |
| 7425 | New Dataset. |
| 7426 | |
| 7427 | See Also |
| 7428 | -------- |
| 7429 | xarray.DataArray.from_series |
| 7430 | pandas.DataFrame.to_xarray |
| 7431 | """ |
| 7432 | # TODO: Add an option to remove dimensions along which the variables |
| 7433 | # are constant, to enable consistent serialization to/from a dataframe, |
| 7434 | # even if some variables have different dimensionality. |
| 7435 | |
| 7436 | if not dataframe.columns.is_unique: |
| 7437 | raise ValueError("cannot convert DataFrame with non-unique columns") |
| 7438 | |
| 7439 | idx = remove_unused_levels_categories(dataframe.index) |
| 7440 | |
| 7441 | if isinstance(idx, pd.MultiIndex) and not idx.is_unique: |
| 7442 | raise ValueError( |
| 7443 | "cannot convert a DataFrame with a non-unique MultiIndex into xarray" |
| 7444 | ) |
| 7445 | |
| 7446 | arrays: list[tuple[Hashable, np.ndarray]] = [] |
| 7447 | extension_arrays: list[tuple[Hashable, pd.Series]] = [] |
| 7448 | for k, v in dataframe.items(): |
| 7449 | if not is_allowed_extension_array(v) or isinstance( |
| 7450 | v.array, UNSUPPORTED_EXTENSION_ARRAY_TYPES |
| 7451 | ): |
| 7452 | arrays.append((k, np.asarray(v))) |
| 7453 | else: |
| 7454 | extension_arrays.append((k, v)) |
| 7455 | |
| 7456 | indexes: dict[Hashable, Index] = {} |
| 7457 | index_vars: dict[Hashable, Variable] = {} |
| 7458 | |
| 7459 | if isinstance(idx, pd.MultiIndex): |