ydata-profilingis nowfg-data-profiling. This package has been renamed tofg-data-profiling. Please follow the Migration Guide as soon as possible — the old package will no longer receive updates or bug fixes.

Documentation | Discord | Stack Overflow | Latest changelog
Do you like this project? Show us your love and give feedback!
fg-data-profiling primary goal is to provide a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution. Like pandas df.describe() function, that is so handy, fg-data-profiling delivers an extended analysis of a DataFrame while allowing the data analysis to be exported in different formats such as html and json.
The package outputs a simple and digested analysis of a dataset, including time-series and text.
Looking for a scalable solution that can fully integrate with your database systems?
Leverage YData Fabric Data Catalog to connect to different databases and storages (Oracle, snowflake, PostGreSQL, GCS, S3, etc.) and leverage an interactive and guided profiling experience in Fabric. Check out the Community Version.
pip uninstall ydata-profiling
pip install fg-data-profiling
Find and replace all occurrences of the old import in your codebase:
# Before
import ydata_profiling
from ydata_profiling import ProfileReport
# After
import data_profiling
from data_profiling import ProfileReport
You can use this one-liner to find all affected files:
grep -r "ydata_profiling" . --include="*.py"
pip install fg-data-profiling
or
conda install -c conda-forge fg-data-profiling
Start by loading your pandas DataFrame as you normally would, e.g. by using:
import numpy as np
import pandas as pd
from data_profiling import ProfileReport
df = pd.DataFrame(np.random.rand(100, 5), columns=["a", "b", "c", "d", "e"])
To generate the standard profiling report, merely run:
profile = ProfileReport(df, title="Profiling Report")
The report contains three additional sections:
Spark support has been released, but we are always looking for an extra pair of hands 👐. Check current work in progress!.
fg-data-profiling can be used to deliver a variety of different use-case. The documentation includes guides, tips and tricks for tackling them:
| Use case | Description |
|---|---|
| Comparing datasets | Comparing multiple version of the same dataset |
| Profiling a Time-Series dataset | Generating a report for a time-series dataset with a single line of code |
| Profiling large datasets | Tips on how to prepare data and configure fg-data-profiling for working with large datasets |
| Handling sensitive data | Generating reports which are mindful about sensitive data in the input dataset |
| Dataset metadata and data dictionaries | Complementing the report with dataset details and column-specific data dictionaries |
| Customizing the report's appearance | Changing the appearance of the report's page and of the contained visualizations |
| Profiling Databases | For a seamless profiling experience in your organization's databases, check Fabric Data Catalog, which allows to consume data from different types of storages such as RDBMs (Azure SQL, PostGreSQL, Oracle, etc.) and object storages (Google Cloud Storage, AWS S3, Snowflake, etc.), among others. |
| ### Using inside Jupyter Notebooks |
There are two interfaces to consume the report inside a Jupyter notebook: through widgets and through an embedded HTML report.

The above is achieved by simply displaying the report as a set of widgets. In a Jupyter Notebook, run:
profile.to_widgets()
The HTML report can be directly embedded in a cell in a similar fashion:
profile.to_notebook_iframe()

To generate a HTML report file, save the ProfileReport to an object and use the to_file() function:
profile.to_file("your_report.html")
Alternatively, the report's data can be obtained as a JSON file:
# As a JSON string
json_data = profile.to_json()
# As a file
profile.to_file("your_report.json")
For standard formatted CSV files (which can be read directly by pandas without additional settings), the data_profiling executable can be used in the command line. The example below generates a report named Example Profiling Report, using a configuration file called default.yaml, in the file report.html by processing a data.csv dataset.
data_profiling --title "Example Profiling Report" --config_file default.yaml data.csv report.html
Additional details on the CLI are available on the documentation.
The following example reports showcase the potentialities of the package across a wide range of dataset and data types:
Additional details, including information about widget support, are available on the documentation.
[![PyPi Downloads](https://pepy.tech/badge/fg-data
$ claude mcp add fg-data-profiling \
-- python -m otcore.mcp_server <graph>