MCPcopy
hub / github.com/visual-layer/fastdup

github.com/visual-layer/fastdup @v2.2_3.10 sqlite

repository ↗ · DeepWiki ↗ · release v2.2_3.10 ↗
344 symbols 1,427 edges 37 files 124 documented · 36%
README

PyPi PyPi PyPi Contributors License

<img alt="fastdup logo." src="https://github.com/visual-layer/fastdup/raw/v2.2_3.10/gallery/logo.png">

Manage, Clean & Curate Visual Data - Fast and at Scale.

An unsupervised and free tool for image and video dataset analysis.

fastdup is founded by the authors of XGBoost, Apache TVM & Turi Create - Danny Bickson, Carlos Guestrin and Amir Alush.

<a href="https://visual-layer.readme.io/" target="_blank" rel="noopener noreferrer"><strong>Explore the docs »</strong></a>



<a href="#whats-included-in-fastdup" target="_blank" rel="noopener noreferrer">Features</a>
·
<a href="https://github.com/visual-layer/fastdup/issues/new/choose" target="_blank" rel="noopener noreferrer">Report Bug</a>
·
<a href="https://medium.com/visual-layer" target="_blank" rel="noopener noreferrer">Blog</a>
·
<a href="https://visual-layer.readme.io/docs/getting-started" target="_blank" rel="noopener noreferrer">Quickstart</a>
·
<a href="https://visual-layer.com/" target="_blank" rel="noopener noreferrer">Enterprise Edition</a>
·
<a href="https://visual-layer.com/about" target="_blank" rel="noopener noreferrer">About us</a>






<a href="https://visualdatabase.slack.com/join/shared_invite/zt-19jaydbjn-lNDEDkgvSI1QwbTXSY6dlA#/shared-invite/email" target="_blank" rel="noopener noreferrer">
<img src="https://img.shields.io/badge/JOIN US ON SLACK-4A154B?style=for-the-badge&logo=slack&logoColor=white" alt="Logo">
</a>
<a href="https://visual-layer.readme.io/discuss" target="_blank" rel="noopener noreferrer">
<img src="https://img.shields.io/badge/DISCUSSION%20FORUM-slateblue?style=for-the-badge&logo=discourse&logoWidth=20" alt="Logo">
</a>
<a href="https://www.linkedin.com/company/visual-layer/" target="_blank" rel="noopener noreferrer">
<img src="https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white" alt="Logo">
</a>
<a href="https://twitter.com/visual_layer" target="_blank" rel="noopener noreferrer">
<img src="https://img.shields.io/badge/Twitter-1DA1F2?style=for-the-badge&logo=twitter&logoColor=white" alt="Logo">
</a>
<a href="https://www.youtube.com/@visual-layer" target="_blank" rel="noopener noreferrer">
<img src="https://img.shields.io/badge/-YouTube-black.svg?style=for-the-badge&logo=youtube&colorB=red" alt="Logo">
</a>

<img alt="VL Profiler." src="https://github.com/visual-layer/fastdup/raw/v2.2_3.10/gallery/github_banner_profiler.gif">

🚀 Introducing VL Profiler! 🚀 We're excited to announce our new cloud product, VL Profiler. It's designed to help you gain deeper insights and enhance your productivity while using fastdup. With VL Profiler, you can visualize your data, track changes over time, and much more.

👉 Check out VL Profiler here 👈

📝 Note: VL Profiler is a separate commercial product developed by the same team behind fastdup. Our goal with VL Profiler is to provide additional value to our users while continuing to support and maintain fastdup as a free, open-source project. We'd love for you to give VL Profiler a try and share your feedback with us! Sign-up now, it's free.

What's included in fastdup

fastdup handles both labeled and unlabeled image/video datasets, helping you to discover potential quality concerns while providing extra functionalities.

fastdup

Why fastdup?

With a plethora of data visualization/profiling tools available, what sets fastdup apart? Here are the top benefits of fastdup:

  • Quality: High-quality analysis to remove duplicates/near-duplicates, anomalies, mislabels, broken images, and poor-quality images.
  • Scale: Handles 400M images on a single CPU machine. Enterprise version scales to billions of images.
  • Speed: Highly optimized C++ engine runs efficiently even on low-resource CPU machines.
  • Privacy: Runs locally or on your cloud infrastructure. Your data stays where it is.
  • Ease of use: Works on labeled or unlabeled datasets, images, or videos. Get started with just 3 lines of code.

Setting up

Prerequisites

Supported Python versions:

PyPi

Supported operating systems:

Windows 10 Windows 11 Windows Server 2019 Windows WSL Ubuntu 22.04 LTS Ubuntu 20.04 LTS Ubuntu 18.04 LTS macOS 10+ (Intel) macOS 10+ (M1) Amazon Linux 2 CentOS 7 RedHat 4.8

Installation

Option 1 - Install fastdup via PyPI:

# upgrade pip to its latest version
pip install -U pip

# install fastdup
pip install fastdup

# Alternatively, use explicit python version (XX)
python3.XX -m pip install fastdup 

Option 2 - Install fastdup via an Ubuntu 20.04 Docker image on DockerHub:

docker pull karpadoni/fastdup-ubuntu-20.04

Detailed installation instructions and common errors here.

Getting Started

Run fastdup with only 3 lines of code.

run

Visualize the result.

results

In short, you'll need 3 lines of code to run fastdup:

import fastdup
fd = fastdup.create(input_dir="IMAGE_FOLDER/")
fd.run()

And 5 lines of code to visualize issues:

fd.vis.duplicates_gallery()    # create a visual gallery of duplicates
fd.vis.outliers_gallery()      # create a visual gallery of anomalies
fd.vis.component_gallery()     # create a visualization of connected components
fd.vis.stats_gallery()         # create a visualization of images statistics (e.g. blur)
fd.vis.similarity_gallery()    # create a gallery of similar images

View the API docs here.

Learn from Examples

Learn the basics of fastdup through interactive examples. View the notebooks on GitHub or nbviewer. Even better, run them on Google Colab or Kaggle, for free.

⚡ Quickstart: Learn how to install fastdup, load a dataset and analyze it for potential issues such as duplicates/near-duplicates, broken images, outliers, dark/bright/blurry images, and view visually similar image clusters. If you're new, start here! 📌 Dataset: Oxford-IIIT Pet.
🧹 Clean Image Folder: Learn how to analyze and clean a folder of images from potential issues and export a list of problematic files for further action. If you have an unorganized folder of images, this is a good place to start. 📌 Dataset: Food-101.

Core symbols most depended-on inside this repo

fastdup_capture_exception
called by 89
fastdup/sentry.py
run
called by 39
fastdup/engine.py
fastdup_capture_log_debug_state
called by 35
fastdup/sentry.py
fastdup_performance_capture
called by 25
fastdup/sentry.py
fastdup_imread
called by 18
fastdup/image.py
_fetch_df
called by 15
fastdup/fastdup_controller.py
component_gallery
called by 15
fastdup/fastdup_galleries.py
shorten_path
called by 14
fastdup/utils.py

Shape

Function 218
Method 116
Class 10

Languages

Python100%

Modules by API surface

fastdup/fastdup_controller.py51 symbols
fastdup/__init__.py42 symbols
fastdup/utils.py35 symbols
fastdup/galleries.py25 symbols
fastdup/image.py22 symbols
fastdup/fastdup_visualizer.py20 symbols
fastdup/fastdup_galleries.py20 symbols
fastdup/html_writer.py14 symbols
fastdup/datasets.py10 symbols
fastdup/synthetic_bbox_data.py9 symbols
fastdup/synthetic_image_data.py8 symbols
fastdup/sentry.py8 symbols

For agents

$ claude mcp add fastdup \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact