<a href="https://docs.unstract.com">Documentation</a> |
<a href="https://unstract.com/pricing/">Enterprise</a>
<a href="https://github.com/Zipstack/unstract/raw/v0.177.6/LICENSE"><img src="https://img.shields.io/github/license/Zipstack/unstract" alt="License"></a>
<a href="https://docs.unstract.com/unstract/unstract_platform/quick_start"><img src="https://img.shields.io/badge/tutorials-docs-brightgreen" alt="Tutorials"></a>
<a href="https://status.unstract.com"><img src="https://img.shields.io/badge/uptime-status-brightgreen" alt="Uptime Status"></a>
<a href="https://hub.docker.com/u/unstract"><img src="https://img.shields.io/docker/pulls/unstract/backend" alt="Docker Pulls"></a>
<a href="https://deepwiki.com/Zipstack/unstract"><img src="https://deepwiki.com/badge.svg" alt="Ask DeepWiki"></a>
<a href="https://cla-assistant.io/Zipstack/unstract"><img src="https://cla-assistant.io/readme/badge/Zipstack/unstract" alt="CLA assistant"></a>
<img src="https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2FZipstack%2Funstract%2Frefs%2Fheads%2Fmain%2Fpyproject.toml" alt="Python Version from PEP 621 TOML">
<a href="https://github.com/astral-sh/uv"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json" alt="uv"></a>
<a href="https://vite.dev/"><img src="https://img.shields.io/badge/Vite-6.x-646CFF?logo=vite&logoColor=white" alt="Vite"></a>
<a href="https://bun.sh/"><img src="https://img.shields.io/badge/Bun-1.x-000000?logo=bun&logoColor=white" alt="Bun"></a>
<a href="https://biomejs.dev/"><img src="https://img.shields.io/badge/Biome-2.x-60A5FA?logo=biome&logoColor=white" alt="Biome"></a>
<a href="https://results.pre-commit.ci/latest/github/Zipstack/unstract/main"><img src="https://results.pre-commit.ci/badge/github/Zipstack/unstract/main.svg" alt="pre-commit.ci status"></a>
<a href="https://sonarcloud.io/summary/new_code?id=Zipstack_unstract"><img src="https://sonarcloud.io/api/project_badges/measure?project=Zipstack_unstract&metric=alert_status" alt="Quality Gate Status"></a>
<a href="https://sonarcloud.io/summary/new_code?id=Zipstack_unstract"><img src="https://sonarcloud.io/api/project_badges/measure?project=Zipstack_unstract&metric=code_smells" alt="Code Smells"></a>
<a href="https://sonarcloud.io/summary/new_code?id=Zipstack_unstract"><img src="https://sonarcloud.io/api/project_badges/measure?project=Zipstack_unstract&metric=duplicated_lines_density" alt="Duplicated Lines (%)"></a>
Unstract uses LLMs to extract structured JSON from documents — PDFs, images, scans, you name it. Define what you want to extract using natural language prompts, and deploy as an API or ETL pipeline.
Built for teams in finance, insurance, healthcare, KYC/compliance, and much more.
| Task | Without Unstract | With Unstract |
|---|---|---|
| Schema definition | Write regex, build templates per vendor | Write a prompt once, handles variations |
| New document type | Days of development | Minutes in Prompt Studio |
| LLM integration | Build your own pipeline | Plug in any provider (OpenAI, Anthropic, Bedrock, Ollama) |
| Deployment | Custom infrastructure | ./run-platform.sh or managed cloud |
| Output | Unstructured text blobs | Clean JSON, ready for your database |
⭐ If Unstract helps you, star this repo!
Prompt Studio — Define document extraction schemas with natural language. Docs →

API Deployment — Send a document over REST API, get JSON back. Docs →

ETL Pipeline — Pull documents from a folder, process them, load to your warehouse. Docs →
MCP Server — Connect to AI agents (Claude, etc.) via Model Context Protocol. Docs →
n8n Node — Drop into existing automation workflows. Docs →
# Clone and start
git clone https://github.com/Zipstack/unstract.git
cd unstract
./run-platform.sh
That's it!
unstract password: unstract# Pull and run entire Unstract platform with default env config.
./run-platform.sh
# Pull and run docker containers with a specific version tag.
./run-platform.sh -v v0.1.0
# Upgrade existing Unstract platform setup by pulling the latest available version.
./run-platform.sh -u
# Upgrade existing Unstract platform setup by pulling a specific version.
./run-platform.sh -u -v v0.2.0
# Build docker images locally as a specific version tag.
./run-platform.sh -b -v v0.1.0
# Build docker images locally from working branch as `current` version tag.
./run-platform.sh -b -v current
# Display the help information.
./run-platform.sh -h
# Only do setup of environment files.
./run-platform.sh -e
# Only do docker images pull with a specific version tag.
./run-platform.sh -p -v v0.1.0
# Only do docker images pull by building locally with a specific version tag.
./run-platform.sh -p -b -v v0.1.0
# Upgrade existing Unstract platform setup with docker images built locally from working branch as `current` version tag.
./run-platform.sh -u -b -v current
# Pull and run docker containers in detached mode.
./run-platform.sh -d -v v0.1.0
[!WARNING] This key encrypts adapter credentials — losing it makes existing adapters inaccessible!
Copy the value of ENCRYPTION_KEY from backend/.env or platform-service/.env to a secure location.
┌────────────────────────────────────────────────────────────┐
│ Unstract │
├─────────────┬─────────────┬─────────────┬──────────────────┤
│ Frontend │ Backend │ Worker │ Platform Service │
│ (React) │ (Django) │ (Celery) │ (FastAPI) │
├─────────────┴─────────────┴─────────────┴──────────────────┤
│ Cache (Redis) │
├────────────────────────────────────────────────────────────┤
│ Message Queue (RabbitMQ) │
├────────────────────────────────────────────────────────────┤
│ Database (PostgreSQL) │
├────────────────────────────────────────────────────────────┤
│ LLM Adapters │ Vector DBs │ Text Extractors │
│ (OpenAI, etc.) │ (Qdrant, etc.) │ (LLMWhisperer) │
└────────────────────────────────────────────────────────────┘
Also see architecture.
| Category | Formats |
|---|---|
| Documents | PDF, DOCX, DOC, ODT, TXT, CSV, JSON |
| Spreadsheets | XLSX, XLS, ODS |
| Presentations | PPTX, PPT, ODP |
| Images | PNG, JPG, JPEG, TIFF, BMP, GIF, WEBP |
| Provider | Status | Provider | Status |
|---|---|---|---|
| OpenAI | ✅ | Azure OpenAI | ✅ |
| OpenAI Compatible | ✅ | Anthropic Claude | ✅ |
| AWS Bedrock | ✅ | Google Gemini | ✅ |
| Ollama (local) | ✅ | Mistral AI | ✅ |
| Anyscale | ✅ |
| Provider | Status | Provider | Status |
|---|---|---|---|
| Qdrant | ✅ | Pinecone | ✅ |
| Weaviate | ✅ | PostgreSQL | ✅ |
| Milvus | ✅ |
| Provider | Status |
|---|---|
| LLMWhisperer | ✅ |
| Unstructured.io | ✅ |
| LlamaIndex Parse | ✅ |
Sources: AWS S3, MinIO, Google Cloud Storage, Azure Blob, Google Drive, Dropbox, SFTP
Destinations: Snowflake, Amazon Redshift, Google BigQuery, PostgreSQL, MySQL, MariaDB, SQL Server, Oracle
Follow these steps to change the default username and password.
# Install pre-commit hooks
./dev-env-cli.sh -p
# Run pre-commit checks
./dev-env-cli.sh -r
Finance & Banking → | Insurance → | Healthcare → | Income Tax →
For teams that need managed infrastructure, advanced accuracy features, or compliance certifications.
We welcome contributions! The easiest way to start:
good first issueReport Bug → | Request Feature →
Join the LLM-powered document automation community:
Unstract integrates Posthog to track minimal usage analytics. Disable by setting REACT_APP_ENABLE_POSTHOG=false in the frontend's .env file.
Unstract is released under the AGPL-3.0 License.
Built with ❤️ by Zipstack
<a href="https://unstract.com">Website</a> ·
<a href="https://docs.unstract.com">Documentation</a> ·
<a href="https://unstract.com/pricing/">Pricing</a>
$ claude mcp add unstract \
-- python -m otcore.mcp_server <graph>