MCPcopy Index your code
hub / github.com/rednote-hilab/dots.ocr

github.com/rednote-hilab/dots.ocr @main

repository ↗ · DeepWiki ↗ · + Follow
175 symbols 627 edges 23 files 83 documented · 47%
README
<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/logo.png" width="300"/>

dots.ocr

HuggingFace Arxiv

🖥️ Live Demo | 💬 WeChat | 📕 rednote | 🐦 X

Introduction

dots.ocr Designed for universal accessibility, it possesses the capability to recognize virtually any human script. Beyond achieving state-of-the-art (SOTA) performance in standard multilingual document parsing among models of comparable size, dots.ocr-1.5 excels at converting structured graphics (e.g., charts and diagrams) directly into SVG code, parsing web screens and spotting scene text.

News

  • 2026.03.19 We have rebranded dots.ocr-1.5 as dots.mocr. For technical details, please refer to our paper. The model weights are available on Hugging Face: dots.mocr and dots.mocr-svg.
  • 2025.10.31 🚀 We release dots.ocr.base, foundation VLM focus on OCR tasks, also the base model of dots.ocr. Try it out!
  • 2025.07.30 🚀 We release dots.ocr, — a multilingual documents parsing model based on 1.7b llm, with SOTA performance.

Evaluation

1. Document Parsing

1.1 Elo Score of different bench between latest models

models olmOCR-Bench OmniDocBench (v1.5) XDocParse Average
MonkeyOCR-pro-3B 895.0 811.3 637.1 781.1
GLM-OCR 884.2 972.6 820.7 892.5
PaddleOCR-VL-1.5 897.3 997.9 866.4 920.5
HuanyuanOCR 997.6 1003.9 951.1 984.2
dots.ocr 1041.1 1027.2 1190.3 1086.2
dots.mocr 1104.4 1059.0 1210.7 1124.7
Gemini 3 Pro 1180.4 1128.0 1323.7 1210.7

Notes: - Results for Gemini 3 Pro, PaddleOCR-VL-1.5, and GLM-OCR were obtained via APIs, while HuanyuanOCR results were generated using local inference. - The Elo score evaluation was conducted using Gemini 3 Flash. The prompt can be found at: Elo Score Prompt. These results are consistent with the findings on ocrarena.

1.2 olmOCR-bench

Model ArXiv Old scans math Tables Old scans Headers & footers Multi column Long tiny text Base Overall
Mistral OCR API 77.2 67.5 60.6 29.3 93.6 71.3 77.1 99.4 72.0±1.1
Marker 1.10.1 83.8 66.8 72.9 33.5 86.6 80.0 85.7 99.3 76.1±1.1
MinerU 2.5.4* 76.6 54.6 84.9 33.7 96.6 78.2 83.5 93.7 75.2±1.1
DeepSeek-OCR 77.2 73.6 80.2 33.3 96.1 66.4 79.4 99.8 75.7±1.0
Nanonets-OCR2-3B 75.4 46.1 86.8 40.9 32.1 81.9 93.0 99.6 69.5±1.1
PaddleOCR-VL* 85.7 71.0 84.1 37.8 97.0 79.9 85.7 98.5 80.0±1.0
Infinity-Parser 7B* 84.4 83.8 85.0 47.9 88.7 84.2 86.4 99.8 82.5±?
olmOCR v0.4.0 83.0 82.3 84.9 47.7 96.1 83.7 81.9 99.7 82.4±1.1
Chandra OCR 0.1.0* 82.2 80.3 88.0 50.4 90.8 81.2 92.3 99.9 83.1±0.9
dots.ocr 82.1 64.2 88.3 40.9 94.1 82.4 81.2 99.5 79.1±1.0
dots.mocr 85.9 85.5 90.7 48.2 94.0 85.3 81.6 99.7 83.9±0.9

Note: - The metrics are from olmocr, and our own internal evaluations. - We delete the Page-header and Page-footer cells in the result markdown.

1.3 Other Benchmarks

Model Type Methods Size OmniDocBench(v1.5) TextEdit↓ OmniDocBench(v1.5) Read OrderEdit↓ pdf-parse-bench
GeneralVLMs Gemini-2.5 Pro - 0.075 0.097 9.06
Qwen3-VL-235B-A22B-Instruct 235B 0.069 0.068 9.71
gemini3pro - 0.066 0.079 9.68
SpecializedVLMs Mistral OCR - 0.164 0.144 8.84
Deepseek-OCR 3B 0.073 0.086 8.26
MonkeyOCR-3B 3B 0.075 0.129 9.27
OCRVerse 4B 0.058 0.071 --
MonkeyOCR-pro-3B 3B 0.075 0.128 -
MinerU2.5 1.2B 0.047 0.044 -
PaddleOCR-VL 0.9B 0.035 0.043 9.51
HunyuanOCR 0.9B 0.042 - -
PaddleOCR-VL1.5 0.9B 0.035 0.042 -
GLMOCR 0.9B 0.04 0.043 -
dots.ocr 3B 0.048 0.053 9.29
dots.mocr 3B 0.031 0.029 9.54

Note: - Metrics are sourced from OmniDocBench and other model publications. pdf-parse-bench results are reproduced by Qwen3-VL-235B-A22B-Instruct. - Formula and Table metrics for OmniDocBench1.5 are omitted due to their high sensitivity to detection and matching protocols.

2. Structured Graphics Parsing

Visual languages (e.g., charts, graphics, chemical formulas, logos) encapsulate dense human knowledge. dots.mocr unifies the interpretation of these elements by parsing them directly into SVG code.

Methods Unisvg Chartmimic Design2Code Genexam SciGen ChemDraw
Low-Level High-Level Score
OCRVerse 0.632 0.852 0.763 0.799 - - - 0.881
Gemini 3 Pro 0.563 0.850 0.735 0.788 0.760 0.756 0.783 0.839
dots.mocr 0.850 0.923 0.894 0.772 0.801 0.664 0.660 0.790
dots.mocr-svg 0.860 0.931 0.902 0.905 0.834 0.8 0.797 0.901

Note: - We use the ISVGEN metric from UniSVG to evaluate the parsing result. For benchmarks that do not natively support image parsing, we use the original images as input, and calculate the ISVGEN score between the rendered output and the original image. - OCRVerse results are derived from various code formats (e.g., SVG, Python), whereas results for Gemini 3 Pro and dots.mocr are based specifically on SVG code. - Due to the capacity constraints of a 3B-parameter VLM, dots.mocr may not excel in all tasks yet like svg. To complement this, we are simultaneously releasing dots.mocr-svg. We plan to further address these limitations in future updates.

3. General Vision Tasks

Model CharXiv_descriptive CharXiv_reasoning OCR_Reasoning infovqa docvqa ChartQA OCRBench AI2D CountBenchQA refcoco
Qwen3vl-2b-instruct 62.3 26.8 -

Core symbols most depended-on inside this repo

export_selected_rids
called by 11
demo/demo_gradio_batch.py
_default_ui_state
called by 11
demo/demo_gradio_batch.py
finalize
called by 9
demo/demo_gradio_batch.py
create_temp_session_dir
called by 8
demo/demo_gradio_batch.py
smart_resize
called by 6
dots_ocr/utils/image_utils.py
inference_with_vllm
called by 5
dots_ocr/model/inference.py
ensure_export_ready
called by 5
demo/demo_gradio_batch.py
_edited_filepath
called by 5
demo/demo_gradio_batch.py

Shape

Function 127
Method 40
Class 7
Route 1

Languages

Python100%

Modules by API surface

demo/demo_gradio_batch.py80 symbols
dots_ocr/utils/output_cleaner.py18 symbols
demo/demo_gradio.py16 symbols
dots_ocr/parser.py12 symbols
demo/demo_gradio_annotion.py10 symbols
dots_ocr/utils/image_utils.py9 symbols
dots_ocr/utils/format_transformer.py7 symbols
dots_ocr/utils/layout_utils.py5 symbols
demo/demo_streamlit.py5 symbols
dots_ocr/utils/doc_utils.py4 symbols
dots_ocr/utils/demo_utils/display.py2 symbols
tools/elo_score_prompt.py1 symbols

Dependencies from manifests, versioned

transformers4.56.1 · 1×

For agents

$ claude mcp add dots.ocr \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact