MCPcopy
hub / github.com/Zipstack/unstract

github.com/Zipstack/unstract @v0.177.6 sqlite

repository ↗ · DeepWiki ↗ · release v0.177.6 ↗
10,960 symbols 40,881 edges 1,697 files 5,576 documented · 51%
README


Unstract

Turn Unstructured Documents into Structured Data

<a href="https://docs.unstract.com">Documentation</a> |
<a href="https://unstract.com/pricing/">Enterprise</a>






<a href="https://github.com/Zipstack/unstract/raw/v0.177.6/LICENSE"><img src="https://img.shields.io/github/license/Zipstack/unstract" alt="License"></a>
<a href="https://docs.unstract.com/unstract/unstract_platform/quick_start"><img src="https://img.shields.io/badge/tutorials-docs-brightgreen" alt="Tutorials"></a>
<a href="https://status.unstract.com"><img src="https://img.shields.io/badge/uptime-status-brightgreen" alt="Uptime Status"></a>
<a href="https://hub.docker.com/u/unstract"><img src="https://img.shields.io/docker/pulls/unstract/backend" alt="Docker Pulls"></a>
<a href="https://deepwiki.com/Zipstack/unstract"><img src="https://deepwiki.com/badge.svg" alt="Ask DeepWiki"></a>
<a href="https://cla-assistant.io/Zipstack/unstract"><img src="https://cla-assistant.io/readme/badge/Zipstack/unstract" alt="CLA assistant"></a>






<img src="https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2FZipstack%2Funstract%2Frefs%2Fheads%2Fmain%2Fpyproject.toml" alt="Python Version from PEP 621 TOML">
<a href="https://github.com/astral-sh/uv"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json" alt="uv"></a>
<a href="https://vite.dev/"><img src="https://img.shields.io/badge/Vite-6.x-646CFF?logo=vite&logoColor=white" alt="Vite"></a>
<a href="https://bun.sh/"><img src="https://img.shields.io/badge/Bun-1.x-000000?logo=bun&logoColor=white" alt="Bun"></a>
<a href="https://biomejs.dev/"><img src="https://img.shields.io/badge/Biome-2.x-60A5FA?logo=biome&logoColor=white" alt="Biome"></a>






<a href="https://results.pre-commit.ci/latest/github/Zipstack/unstract/main"><img src="https://results.pre-commit.ci/badge/github/Zipstack/unstract/main.svg" alt="pre-commit.ci status"></a>
<a href="https://sonarcloud.io/summary/new_code?id=Zipstack_unstract"><img src="https://sonarcloud.io/api/project_badges/measure?project=Zipstack_unstract&metric=alert_status" alt="Quality Gate Status"></a>
<a href="https://sonarcloud.io/summary/new_code?id=Zipstack_unstract"><img src="https://sonarcloud.io/api/project_badges/measure?project=Zipstack_unstract&metric=code_smells" alt="Code Smells"></a>
<a href="https://sonarcloud.io/summary/new_code?id=Zipstack_unstract"><img src="https://sonarcloud.io/api/project_badges/measure?project=Zipstack_unstract&metric=duplicated_lines_density" alt="Duplicated Lines (%)"></a>

What is Unstract?

Unstract uses LLMs to extract structured JSON from documents — PDFs, images, scans, you name it. Define what you want to extract using natural language prompts, and deploy as an API or ETL pipeline.

Built for teams in finance, insurance, healthcare, KYC/compliance, and much more.

Current State vs. Unstract

Task Without Unstract With Unstract
Schema definition Write regex, build templates per vendor Write a prompt once, handles variations
New document type Days of development Minutes in Prompt Studio
LLM integration Build your own pipeline Plug in any provider (OpenAI, Anthropic, Bedrock, Ollama)
Deployment Custom infrastructure ./run-platform.sh or managed cloud
Output Unstructured text blobs Clean JSON, ready for your database

⭐ If Unstract helps you, star this repo!

Star Unstract

✨ Key Features

Prompt Studio — Define document extraction schemas with natural language. Docs →

Prompt Studio

API Deployment — Send a document over REST API, get JSON back. Docs →

API Deployment

ETL Pipeline — Pull documents from a folder, process them, load to your warehouse. Docs →

MCP Server — Connect to AI agents (Claude, etc.) via Model Context Protocol. Docs →

n8n Node — Drop into existing automation workflows. Docs →

🚀 Quickstart (~5 mins)

System Requirements & Prerequisites

  • Linux or macOS (Intel or M-series)
  • Docker & Docker Compose
  • 8 GB RAM minimum
  • Git

Run Locally

# Clone and start
git clone https://github.com/Zipstack/unstract.git
cd unstract
./run-platform.sh

That's it!

📦 Other Deployment Options

Docker Compose

# Pull and run entire Unstract platform with default env config.
./run-platform.sh

# Pull and run docker containers with a specific version tag.
./run-platform.sh -v v0.1.0

# Upgrade existing Unstract platform setup by pulling the latest available version.
./run-platform.sh -u

# Upgrade existing Unstract platform setup by pulling a specific version.
./run-platform.sh -u -v v0.2.0

# Build docker images locally as a specific version tag.
./run-platform.sh -b -v v0.1.0

# Build docker images locally from working branch as `current` version tag.
./run-platform.sh -b -v current

# Display the help information.
./run-platform.sh -h

# Only do setup of environment files.
./run-platform.sh -e

# Only do docker images pull with a specific version tag.
./run-platform.sh -p -v v0.1.0

# Only do docker images pull by building locally with a specific version tag.
./run-platform.sh -p -b -v v0.1.0

# Upgrade existing Unstract platform setup with docker images built locally from working branch as `current` version tag.
./run-platform.sh -u -b -v current

# Pull and run docker containers in detached mode.
./run-platform.sh -d -v v0.1.0

🔐 Backup Encryption Key

[!WARNING] This key encrypts adapter credentials — losing it makes existing adapters inaccessible!

Copy the value of ENCRYPTION_KEY from backend/.env or platform-service/.env to a secure location.

🏗️ Unstract Architecture

┌────────────────────────────────────────────────────────────┐
│                          Unstract                          │
├─────────────┬─────────────┬─────────────┬──────────────────┤
│  Frontend   │   Backend   │   Worker    │ Platform Service │
│  (React)    │  (Django)   │  (Celery)   │   (FastAPI)      │
├─────────────┴─────────────┴─────────────┴──────────────────┤
│                      Cache (Redis)                         │
├────────────────────────────────────────────────────────────┤
│                  Message Queue (RabbitMQ)                  │
├────────────────────────────────────────────────────────────┤
│                   Database (PostgreSQL)                    │
├────────────────────────────────────────────────────────────┤
│  LLM Adapters    │  Vector DBs    │  Text Extractors       │
│  (OpenAI, etc.)  │ (Qdrant, etc.) │  (LLMWhisperer)        │
└────────────────────────────────────────────────────────────┘

Also see architecture.

📄 Document File Formats

Category Formats
Documents PDF, DOCX, DOC, ODT, TXT, CSV, JSON
Spreadsheets XLSX, XLS, ODS
Presentations PPTX, PPT, ODP
Images PNG, JPG, JPEG, TIFF, BMP, GIF, WEBP

🔌 Connectors & Adapters

LLM Providers

Provider Status Provider Status
OpenAI Azure OpenAI
OpenAI Compatible Anthropic Claude
AWS Bedrock Google Gemini
Ollama (local) Mistral AI
Anyscale

Vector Databases

Provider Status Provider Status
Qdrant Pinecone
Weaviate PostgreSQL
Milvus

Text Extractors

Provider Status
LLMWhisperer
Unstructured.io
LlamaIndex Parse

ETL Sources & Destinations

Sources: AWS S3, MinIO, Google Cloud Storage, Azure Blob, Google Drive, Dropbox, SFTP

Destinations: Snowflake, Amazon Redshift, Google BigQuery, PostgreSQL, MySQL, MariaDB, SQL Server, Oracle

Full Connector List

🛠️ Development

Change Default Credentials

Follow these steps to change the default username and password.

Local Development

# Install pre-commit hooks
./dev-env-cli.sh -p

# Run pre-commit checks
./dev-env-cli.sh -r

Local Development Guide

🏢 Use Cases by Industry

Finance & Banking → | Insurance → | Healthcare → | Income Tax →

☁️ Cloud & Enterprise

For teams that need managed infrastructure, advanced accuracy features, or compliance certifications.

  • LLMChallenge — dual-LLM verification
  • SinglePass & Summarized Extraction — reduce LLM token costs
  • Human-in-the-Loop — review interface with document highlighting
  • SSO & Enterprise RBAC — SAML/OIDC integration with granular role-based access control
  • SOC 2, HIPAA, ISO 27001, GDPR Compliant — third-party audited security certifications
  • Priority Support with SLA — dedicated support team with response time guarantees

Book a Demo

📚 Cookbooks

🤝 Contributing

We welcome contributions! The easiest way to start:

  1. Pick an issue tagged good first issue
  2. Submit a PR

Report Bug → | Request Feature →

👋 Community

Join the LLM-powered document automation community:

Blog LinkedIn Slack X

📊 A Note on Analytics

Unstract integrates Posthog to track minimal usage analytics. Disable by setting REACT_APP_ENABLE_POSTHOG=false in the frontend's .env file.

📜 License

Unstract is released under the AGPL-3.0 License.


Built with ❤️ by Zipstack

<a href="https://unstract.com">Website</a> ·
<a href="https://docs.unstract.com">Documentation</a> ·
<a href="https://unstract.com/pricing/">Pricing</a>

Core symbols most depended-on inside this repo

get
called by 1126
workers/shared/utils/local_context.py
error
called by 788
workers/shared/models/scheduler_models.py
info
called by 739
workers/shared/workflow/execution/active_file_manager.py
warning
called by 616
workers/shared/workflow/execution/active_file_manager.py
info
called by 536
unstract/core/src/unstract/core/cache/redis_client.py
get
called by 514
unstract/core/src/unstract/core/cache/redis_client.py
filter
called by 490
backend/backend/settings/base.py
debug
called by 460
workers/shared/workflow/execution/active_file_manager.py

Shape

Method 6,076
Class 2,336
Function 2,207
Route 341

Languages

Python91%
TypeScript9%

Modules by API surface

unstract/core/src/unstract/core/data_models.py159 symbols
unstract/sdk1/tests/test_execution.py109 symbols
workers/shared/models/execution_models.py94 symbols
unstract/sdk1/src/unstract/sdk1/adapters/base1.py92 symbols
workers/shared/api/internal_client.py89 symbols
unstract/core/src/unstract/core/worker_models.py88 symbols
workers/tests/test_ide_callback.py83 symbols
workers/tests/test_answer_prompt.py78 symbols
workers/tests/test_sanity_phase6j.py64 symbols
workers/tests/test_sanity_phase4.py64 symbols
workers/tests/test_sanity_phase2.py62 symbols
workers/tests/test_sanity_phase3.py61 symbols

Dependencies from manifests, versioned

@ant-design/icons5.1.4 · 1×
@biomejs/biome2.3.13 · 1×
@monaco-editor/react4.7.0 · 1×
@react-awesome-query-builder/antd6.6.10 · 1×
@react-pdf-viewer/core3.12.0 · 1×
@react-pdf-viewer/default-layout3.12.0 · 1×
@react-pdf-viewer/highlight3.12.0 · 1×
@react-pdf-viewer/page-navigation3.12.0 · 1×
@rjsf/antd5.16.1 · 1×
@rjsf/core5.8.1 · 1×
@rjsf/utils5.8.1 · 1×
@rjsf/validator-ajv85.8.1 · 1×

Datastores touched

(mysql)Database · 1 repos
mydatabaseDatabase · 1 repos
unstract_dbDatabase · 1 repos

For agents

$ claude mcp add unstract \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact