hub / github.com/PaddlePaddle/PaddleOCR

github.com/PaddlePaddle/PaddleOCR @v3.7.0 sqlite

repository ↗ · DeepWiki ↗ · release v3.7.0 ↗

6,647 symbols 18,753 edges 713 files 1,122 documented · 17%

README

  <img width="800" src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/paddleocr/README/Banner.png" alt="Star-history">

Global Leading OCR Toolkit & Document AI Engine

PaddleOCR converts PDF documents and images into structured, LLM-ready data (JSON/Markdown) with industry-leading accuracy. With 70k+ Stars and trusted by top-tier projects like Dify, RAGFlow, and Cherry Studio, PaddleOCR is the bedrock for building intelligent RAG and Agentic applications.

🚀 Key Features

📄 Intelligent Document Parsing (LLM-Ready)

Transforming messy visuals into structured data for the LLM era.

SOTA Document VLM: Featuring PaddleOCR-VL-1.6 (0.9B), the industry's leading lightweight vision-language model for document parsing. It achieves 96.3% accuracy on OmniDocBench v1.6, leads in text, formula, and table recognition, and shows significantly enhanced capabilities in ancient documents, rare characters, seals, and charts, with structured outputs in Markdown and JSON formats.
Structure-Aware Conversion: Powered by PP-StructureV3, seamlessly convert complex PDFs and images into Markdown or JSON. Unlike the PaddleOCR-VL series models, it provides more fine-grained coordinate information, including table cell coordinates, text coordinates, and more.
Production-Ready Efficiency: Achieve commercial-grade accuracy with an ultra-small footprint. Outperforms numerous closed-source solutions in public benchmarks while remaining resource-efficient for edge/cloud deployment.

🔍 Universal Text Recognition (Scene OCR)

The global gold standard for high-speed, multilingual text spotting.

100+ Languages Supported: Native recognition for a vast global library. PP-OCRv6 supports 50 languages with a single unified model (Chinese, English, Japanese, and 46 Latin-script languages) — no model switching needed for multilingual documents.
Complex Element Mastery: Beyond standard text recognition, we support natural scene text spotting across a wide range of environments, including IDs, street views, books, and industrial components
Performance Leap: PP-OCRv6 achieves +4.6% detection and +5.1% recognition accuracy over PP-OCRv5, surpassing mainstream Vision-Language Models. 5.2× CPU inference speedup end-to-end.

🛠️ Developer-Centric Ecosystem

Seamless Integration: The premier choice for the AI Agent ecosystem—deeply integrated with Dify, RAGFlow, Pathway, and Cherry Studio.
LLM Data Flywheel: A complete pipeline to build high-quality datasets, providing a sustainable "Data Engine" for fine-tuning Large Language Models.
One-Click Deployment: Supports various hardware backends (NVIDIA GPU, Intel CPU, Kunlunxin XPU, and diverse AI Accelerators).

📣 Recent updates

🔥 2026.06.11: Release of PaddleOCR 3.7.0

PP-OCRv6 highlights:
- Accuracy boost: Medium tier achieves +4.6% detection and +5.1% recognition over PP-OCRv5_server, surpassing mainstream VLMs (Qwen3-VL-235B, GPT-5.5) with only 34.5M parameters.
- 50 languages unified: Single model covers Chinese, English, Japanese, and 46 Latin-script languages — no model switching needed.
- Specialized scenarios: Major improvements in digital displays, dot-matrix characters, tire prints, and industrial text recognition.
- Faster inference: 5.2× CPU speedup (OpenVINO), 6.1× on Apple M4 (tiny), 0.13s on A100 GPU.
- Three tiers for all scenarios: tiny (1.5M) / small (7.7M) / medium (34.5M) for edge, mobile, and server deployment.
- Documentation: PP-OCRv6 Technical Doc

2026.05.28: Release of PaddleOCR 3.6.0

PaddleOCR-VL-1.6 highlights:
- New SOTA Accuracy: Achieves over 96.3% on OmniDocBench v1.6, also sets new SOTA on OmniDocBench v1.5 and Real5-OmniDocBench, leading both open-source and proprietary solutions in text, formula, and table recognition.
- Comprehensive Capability Upgrade: Significant improvements in table, ancient document, and rare character recognition, with notably enhanced seal recognition, spotting, and chart understanding across multiple scenarios.
- Seamless Migration: Model architecture is fully consistent with PaddleOCR-VL-1.5, enabling zero-cost adaptation—swap and go.
- Try it now: Available on HuggingFace or our Official Website.

2026.04.21: Release of PaddleOCR 3.5.0

Flexible inference backends: Seamlessly switch between Paddle static graph, Paddle dynamic graph, or Transformers. PaddleOCR is now deeply integrated with the Hugging Face ecosystem, and 20 major models support Transformers as the inference backend.
Office documents to Markdown: Convert common document formats such as Word, Excel, and PowerPoint into Markdown.
DOCX export for parsed results: The PaddleOCR-VL series, PP-StructureV3, and PP-DocTranslation now support exporting parsed results to DOCX for convenient viewing and editing in Microsoft Word.
Official browser inference SDK: Released PaddleOCR.js, the official browser inference SDK that supports running PP-OCRv5 directly in the browser.

2026.01.29: Release of PaddleOCR 3.4.0

PaddleOCR-VL-1.5 (SOTA 0.9B VLM): Our latest flagship model for document parsing is now live!
- 94.5% Accuracy on OmniDocBench: Surpassing top-tier general large models and specialized document parsers.
- Real-World Robustness: First to introduce the PP-DocLayoutV3 algorithm for irregular shape positioning, mastering 5 tough scenarios: Skew, Warping, Scanning, Illumination, and Screen Photography.
- Capability Expansion: Now supports Seal Recognition, Text Spotting, and expands to 111 languages (including China’s Tibetan script and Bengali).
- Long Document Mastery: Supports automatic cross-page table merging and hierarchical heading identification.
- Try it now: Available on HuggingFace or our Official Website.

2025.10.16: Release of PaddleOCR 3.3.0

Released PaddleOCR-VL:
- Model Introduction:
  - PaddleOCR-VL is a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition. This innovative model efficiently supports 109 languages and excels in recognizing complex elements (e.g., text, tables, formulas, and charts), while maintaining minimal resource consumption. Through comprehensive evaluations on widely used public benchmarks and in-house benchmarks, PaddleOCR-VL achieves SOTA performance in both page-level document parsing and element-level recognition. It significantly outperforms existing solutions, exhibits strong competitiveness against top-tier VLMs, and delivers fast inference speeds. These strengths make it highly suitable for practical deployment in real-world scenarios. The model has been released on HuggingFace. Everyone is welcome to download and use it! More introduction information can be found in PaddleOCR-VL.
- Core Features:
  - Compact yet Powerful VLM Architecture: We present a novel vision-language model that is specifically designed for resource-efficient inference, achieving outstanding performance in element recognition. By integrating a NaViT-style dynamic high-resolution visual encoder with the lightweight ERNIE-4.5-0.3B language model, we significantly enhance the model’s recognition capabilities and decoding efficiency. This integration maintains high accuracy while reducing computational demands, making it well-suited for efficient and practical document processing applications.
  - SOTA Performance on Document Parsing: PaddleOCR-VL achieves state-of-the-art performance in both page-level document parsing and element-level recognition. It significantly outperforms existing pipeline-based solutions and exhibiting strong competitiveness against leading vision-language models (VLMs) in document parsing. Moreover, it excels in recognizing complex document elements, such as text, tables, formulas, and charts, making it suitable for a wide range of challenging content types, including handwritten text and historical documents. This makes it highly versatile and suitable for a wide range of document types and scenarios.
  - Multilingual Support: PaddleOCR-VL Supports 109 languages, covering major global languages, including but not limited to Chinese, English, Japanese, Latin, and Korean, as well as languages with different scripts and structures, such as Russian (Cyrillic script), Arabic, Hindi (Devanagari script), and Thai. This broad language coverage substantially enhances the applicability of our system to multilingual and globalized document processing scenarios.
Released PP-OCRv5 Multilingual Recognition Model:
- Improved the accuracy and coverage of Latin script recognition; added support for Cyrillic, Arabic, Devanagari, Telugu, Tamil, and other language systems, covering recognition of 109 languages. The model has only 2M parameters, and the accuracy of some models has increased by over 40% compared to the previous generation.

2025.08.21: Release of PaddleOCR 3.2.0

Significant Model Additions:
- Introduced training, infere

Extension points exported contracts — how you extend this code

SourceMatResult (Interface)

(no doc) [4 implementers]

paddleocr-js/packages/core/src/platform/browser.ts

DocParsingOptionsProvider (Interface)

DocParsingOptionsProvider marks document parsing option structs. [2 implementers]

api_sdk/go/models.go

AppState (Interface)

(no doc)

paddleocr-js/apps/demo/src/main.ts

ResourceSavePlan (Interface)

(no doc)

api_sdk/typescript/src/client.ts

DetModel (Interface)

(no doc) [2 implementers]

paddleocr-js/packages/core/src/models/det.ts

ClientOption (FuncType)

(no doc)

api_sdk/go/options.go

OCROptions (Interface)

(no doc)

api_sdk/typescript/src/models.ts

RecModel (Interface)

(no doc) [2 implementers]

paddleocr-js/packages/core/src/models/rec.ts

Core symbols most depended-on inside this repo

format

called by 537

ppstructure/table/tablepyxl/style.py

get

called by 526

paddleocr-js/packages/core/src/types/opencv.d.ts

pop

called by 134

ppocr/modeling/heads/rec_unimernet_head.py

mean

called by 133

paddleocr-js/packages/core/src/types/opencv.d.ts

resize

called by 96

paddleocr-js/packages/core/src/types/opencv.d.ts

predict

called by 62

paddleocr-js/packages/core/src/models/det.ts

Shape

Method 3,710

Function 1,576

Class 1,230

Interface 90

Struct 33

Route 5

FuncType 2

Enum 1

Languages

Python84%

TypeScript13%

Java2%

Go2%

Modules by API surface

docs/version2.x/javascripts/katex.min.js301 symbols

ppocr/data/imaug/label_ops.py156 symbols

ppocr/postprocess/rec_postprocess.py118 symbols

ppocr/modeling/heads/rec_unimernet_head.py110 symbols

ppocr/modeling/backbones/rec_pphgnetv2.py88 symbols

ppocr/modeling/backbones/rec_resnetv2.py87 symbols

ppocr/modeling/backbones/rec_donut_swin.py73 symbols

ppocr/modeling/heads/rec_latexocr_head.py72 symbols

ppocr/losses/distillation_loss.py70 symbols

ppocr/data/imaug/rec_img_aug.py70 symbols

ppocr/data/imaug/unimernet_aug.py62 symbols

ppocr/modeling/backbones/rec_svtrv2.py54 symbols

Used by 1 indexed graphs manifest dependencies, hub-wide

github.com/run-llama/llama_index

Dependencies from manifests, versioned

@eslint/js10.0.1 · 1×

@paddleocr/paddleocr-js* · 1×

@techstark/opencv-js4.10.0-release.1 · 1×

@types/js-yaml4.0.9 · 1×

@types/node25.9.1 · 1×

@vitest/coverage-v83.2.4 · 1×

clipper-lib6.4.2 · 1×

eslint10.0.2 · 1×

globals17.4.0 · 1×

husky9.1.7 · 1×

js-yaml4.1.0 · 1×

jsdom26.1.0 · 1×

For agents

$ claude mcp add PaddleOCR \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact