hub / github.com/unionai-oss/pandera

github.com/unionai-oss/pandera @v0.32.1 sqlite

repository ↗ · DeepWiki ↗ · release v0.32.1 ↗

5,592 symbols 22,436 edges 401 files 3,079 documented · 55%

README

The Open-source Framework for Dataset Validation

📊 🔎 ✅

Data validation for scientists, engineers, and analysts seeking correctness.

Pandera is a Union.ai open source project that provides a flexible and expressive API for performing data validation on dataframe-like objects. The goal of Pandera is to make data processing pipelines more readable and robust with statistically typed dataframes.

Install

Pandera supports multiple dataframe libraries, including pandas, polars, pyspark, and more. To validate pandas DataFrames, install Pandera with the pandas extra:

With pip:

pip install 'pandera[pandas]'

With uv:

uv pip install 'pandera[pandas]'

With conda:

conda install -c conda-forge pandera-pandas

Get started

First, create a dataframe:

import pandas as pd
import pandera.pandas as pa

# data to validate
df = pd.DataFrame({
    "column1": [1, 2, 3],
    "column2": [1.1, 1.2, 1.3],
    "column3": ["a", "b", "c"],
})

Validate the data using the object-based API:

# define a schema
schema = pa.DataFrameSchema({
    "column1": pa.Column(int, pa.Check.ge(0)),
    "column2": pa.Column(float, pa.Check.lt(10)),
    "column3": pa.Column(
        str,
        [
            pa.Check.isin([*"abc"]),
            pa.Check(lambda series: series.str.len() == 1),
        ]
    ),
})

print(schema.validate(df))
#    column1  column2 column3
# 0        1      1.1       a
# 1        2      1.2       b
# 2        3      1.3       c

Or validate the data using the class-based API:

# define a schema
class Schema(pa.DataFrameModel):
    column1: int = pa.Field(ge=0)
    column2: float = pa.Field(lt=10)
    column3: str = pa.Field(isin=[*"abc"])

    @pa.check("column3")
    def custom_check(cls, series: pd.Series) -> pd.Series:
        return series.str.len() == 1

print(Schema.validate(df))
#    column1  column2 column3
# 0        1      1.1       a
# 1        2      1.2       b
# 2        3      1.3       c

[!WARNING] Pandera v0.24.0 introduces the pandera.pandas module, which is now the (highly) recommended way of defining DataFrameSchemas and DataFrameModels for pandas data structures like DataFrames. Defining a dataframe schema from the top-level pandera module will produce a FutureWarning:

```python import pandera as pa

schema = pa.DataFrameSchema({"col": pa.Column(str)}) ```

Update your import to:

python import pandera.pandas as pa

And all of the rest of your pandera code should work. Using the top-level pandera module to access DataFrameSchema and the other pandera classes or functions will be deprecated in version 0.29.0

Next steps

See the official documentation to learn more.

Core symbols most depended-on inside this repo

pandera/api/ibis/container.py

validate

called by 308

pandera/api/base/model.py

to_schema

called by 131

pandera/api/base/model.py

validate

called by 99

pandera/api/xarray/container.py

pandera/api/checks.py

filter

called by 76

docs/source/conf.py

Shape

Function 2,289

Method 2,098

Class 1,125

Route 80

Languages

Python100%

TypeScript1%

Modules by API surface

tests/pandas/test_model.py208 symbols

tests/pandas/test_decorators.py180 symbols

pandera/engines/pandas_engine.py167 symbols

tests/pandas/test_schemas.py128 symbols

tests/pyspark/test_pyspark_check.py84 symbols

tests/pandas/test_typing.py83 symbols

tests/pandas/test_checks_builtin.py79 symbols

pandera/dtypes.py78 symbols

pandera/engines/polars_engine.py76 symbols

tests/xarray/test_data_array_schema.py74 symbols

tests/narwhals/test_e2e.py65 symbols

pandera/strategies/pandas_strategies.py65 symbols

Dependencies from manifests, versioned

asv0.5.1 · 1×

black24.0 · 1×

frictionless4.40.8 · 1×

hypothesis6.92.7 · 1×

ibis-framework9.0.0 · 1×

isort5.7.0 · 1×

mypy1.10.0 · 1×

narwhals1.26.0 · 1×

numpy1.24.4 · 1×

packaging20.0 · 1×

pandas2.1.1 · 1×

polars0.20.0 · 1×

For agents

$ claude mcp add pandera \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact