MCPcopy Index your code
hub / github.com/unionai-oss/pandera

github.com/unionai-oss/pandera @v0.32.1 sqlite

repository ↗ · DeepWiki ↗ · release v0.32.1 ↗
5,592 symbols 22,436 edges 401 files 3,079 documented · 55%
README

The Open-source Framework for Dataset Validation

📊 🔎 ✅

Data validation for scientists, engineers, and analysts seeking correctness.

CI Build Documentation Status PyPI version shields.io PyPI license pyOpenSci Project Status: Active – The project has reached a stable, usable state and is being actively developed. Documentation Status codecov PyPI pyversions DOI asv Total Downloads Conda Downloads Slack

Pandera is a Union.ai open source project that provides a flexible and expressive API for performing data validation on dataframe-like objects. The goal of Pandera is to make data processing pipelines more readable and robust with statistically typed dataframes.

Install

Pandera supports multiple dataframe libraries, including pandas, polars, pyspark, and more. To validate pandas DataFrames, install Pandera with the pandas extra:

With pip:

pip install 'pandera[pandas]'

With uv:

uv pip install 'pandera[pandas]'

With conda:

conda install -c conda-forge pandera-pandas

Get started

First, create a dataframe:

import pandas as pd
import pandera.pandas as pa

# data to validate
df = pd.DataFrame({
    "column1": [1, 2, 3],
    "column2": [1.1, 1.2, 1.3],
    "column3": ["a", "b", "c"],
})

Validate the data using the object-based API:

# define a schema
schema = pa.DataFrameSchema({
    "column1": pa.Column(int, pa.Check.ge(0)),
    "column2": pa.Column(float, pa.Check.lt(10)),
    "column3": pa.Column(
        str,
        [
            pa.Check.isin([*"abc"]),
            pa.Check(lambda series: series.str.len() == 1),
        ]
    ),
})

print(schema.validate(df))
#    column1  column2 column3
# 0        1      1.1       a
# 1        2      1.2       b
# 2        3      1.3       c

Or validate the data using the class-based API:

# define a schema
class Schema(pa.DataFrameModel):
    column1: int = pa.Field(ge=0)
    column2: float = pa.Field(lt=10)
    column3: str = pa.Field(isin=[*"abc"])

    @pa.check("column3")
    def custom_check(cls, series: pd.Series) -> pd.Series:
        return series.str.len() == 1

print(Schema.validate(df))
#    column1  column2 column3
# 0        1      1.1       a
# 1        2      1.2       b
# 2        3      1.3       c

[!WARNING] Pandera v0.24.0 introduces the pandera.pandas module, which is now the (highly) recommended way of defining DataFrameSchemas and DataFrameModels for pandas data structures like DataFrames. Defining a dataframe schema from the top-level pandera module will produce a FutureWarning:

```python import pandera as pa

schema = pa.DataFrameSchema({"col": pa.Column(str)}) ```

Update your import to:

python import pandera.pandas as pa

And all of the rest of your pandera code should work. Using the top-level pandera module to access DataFrameSchema and the other pandera classes or functions will be deprecated in version 0.29.0

Next steps

See the official documentation to learn more.

Core symbols most depended-on inside this repo

get
called by 328
pandera/config.py
validate
called by 317
pandera/api/ibis/container.py
validate
called by 308
pandera/api/base/model.py
to_schema
called by 131
pandera/api/base/model.py
validate
called by 99
pandera/api/xarray/container.py
config_context
called by 88
pandera/config.py
isin
called by 88
pandera/api/checks.py
filter
called by 76
docs/source/conf.py

Shape

Function 2,289
Method 2,098
Class 1,125
Route 80

Languages

Python100%
TypeScript1%

Modules by API surface

tests/pandas/test_model.py208 symbols
tests/pandas/test_decorators.py180 symbols
pandera/engines/pandas_engine.py167 symbols
tests/pandas/test_schemas.py128 symbols
tests/pyspark/test_pyspark_check.py84 symbols
tests/pandas/test_typing.py83 symbols
tests/pandas/test_checks_builtin.py79 symbols
pandera/dtypes.py78 symbols
pandera/engines/polars_engine.py76 symbols
tests/xarray/test_data_array_schema.py74 symbols
tests/narwhals/test_e2e.py65 symbols
pandera/strategies/pandas_strategies.py65 symbols

Dependencies from manifests, versioned

asv0.5.1 · 1×
black24.0 · 1×
frictionless4.40.8 · 1×
hypothesis6.92.7 · 1×
ibis-framework9.0.0 · 1×
isort5.7.0 · 1×
mypy1.10.0 · 1×
narwhals1.26.0 · 1×
numpy1.24.4 · 1×
packaging20.0 · 1×
pandas2.1.1 · 1×
polars0.20.0 · 1×

For agents

$ claude mcp add pandera \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact