hub / github.com/rednote-hilab/dots.ocr

github.com/rednote-hilab/dots.ocr @main

175 symbols 627 edges 23 files 83 documented · 47%

README

<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/logo.png" width="300"/>

dots.ocr

🖥️ Live Demo | 💬 WeChat | 📕 rednote | 🐦 X

Introduction

dots.ocr Designed for universal accessibility, it possesses the capability to recognize virtually any human script. Beyond achieving state-of-the-art (SOTA) performance in standard multilingual document parsing among models of comparable size, dots.ocr-1.5 excels at converting structured graphics (e.g., charts and diagrams) directly into SVG code, parsing web screens and spotting scene text.

News

2026.03.19 We have rebranded dots.ocr-1.5 as dots.mocr. For technical details, please refer to our paper. The model weights are available on Hugging Face: dots.mocr and dots.mocr-svg.
2025.10.31 🚀 We release dots.ocr.base, foundation VLM focus on OCR tasks, also the base model of dots.ocr. Try it out!
2025.07.30 🚀 We release dots.ocr, — a multilingual documents parsing model based on 1.7b llm, with SOTA performance.

Evaluation

1. Document Parsing

1.1 Elo Score of different bench between latest models

models	olmOCR-Bench	OmniDocBench (v1.5)	XDocParse	Average
MonkeyOCR-pro-3B	895.0	811.3	637.1	781.1
GLM-OCR	884.2	972.6	820.7	892.5
PaddleOCR-VL-1.5	897.3	997.9	866.4	920.5
HuanyuanOCR	997.6	1003.9	951.1	984.2
dots.ocr	1041.1	1027.2	1190.3	1086.2
dots.mocr	1104.4	1059.0	1210.7	1124.7
Gemini 3 Pro	1180.4	1128.0	1323.7	1210.7

Notes: - Results for Gemini 3 Pro, PaddleOCR-VL-1.5, and GLM-OCR were obtained via APIs, while HuanyuanOCR results were generated using local inference. - The Elo score evaluation was conducted using Gemini 3 Flash. The prompt can be found at: Elo Score Prompt. These results are consistent with the findings on ocrarena.

1.2 olmOCR-bench

Model	ArXiv	Old scans math	Tables	Old scans	Headers & footers	Multi column	Long tiny text	Base	Overall
Mistral OCR API	77.2	67.5	60.6	29.3	93.6	71.3	77.1	99.4	72.0±1.1
Marker 1.10.1	83.8	66.8	72.9	33.5	86.6	80.0	85.7	99.3	76.1±1.1
MinerU 2.5.4*	76.6	54.6	84.9	33.7	96.6	78.2	83.5	93.7	75.2±1.1
DeepSeek-OCR	77.2	73.6	80.2	33.3	96.1	66.4	79.4	99.8	75.7±1.0
Nanonets-OCR2-3B	75.4	46.1	86.8	40.9	32.1	81.9	93.0	99.6	69.5±1.1
PaddleOCR-VL*	85.7	71.0	84.1	37.8	97.0	79.9	85.7	98.5	80.0±1.0
Infinity-Parser 7B*	84.4	83.8	85.0	47.9	88.7	84.2	86.4	99.8	82.5±?
olmOCR v0.4.0	83.0	82.3	84.9	47.7	96.1	83.7	81.9	99.7	82.4±1.1
Chandra OCR 0.1.0*	82.2	80.3	88.0	50.4	90.8	81.2	92.3	99.9	83.1±0.9
dots.ocr	82.1	64.2	88.3	40.9	94.1	82.4	81.2	99.5	79.1±1.0
dots.mocr	85.9	85.5	90.7	48.2	94.0	85.3	81.6	99.7	83.9±0.9

Note: - The metrics are from olmocr, and our own internal evaluations. - We delete the Page-header and Page-footer cells in the result markdown.

1.3 Other Benchmarks

Model Type	Methods	Size	OmniDocBench(v1.5) TextEdit↓	OmniDocBench(v1.5) Read OrderEdit↓	pdf-parse-bench
GeneralVLMs	Gemini-2.5 Pro	-	0.075	0.097	9.06
Qwen3-VL-235B-A22B-Instruct	235B	0.069	0.068	9.71
gemini3pro	-	0.066	0.079	9.68
SpecializedVLMs	Mistral OCR	-	0.164	0.144	8.84
Deepseek-OCR	3B	0.073	0.086	8.26
MonkeyOCR-3B	3B	0.075	0.129	9.27
OCRVerse	4B	0.058	0.071	--
MonkeyOCR-pro-3B	3B	0.075	0.128	-
MinerU2.5	1.2B	0.047	0.044	-
PaddleOCR-VL	0.9B	0.035	0.043	9.51
HunyuanOCR	0.9B	0.042	-	-
PaddleOCR-VL1.5	0.9B	0.035	0.042	-
GLMOCR	0.9B	0.04	0.043	-
dots.ocr	3B	0.048	0.053	9.29
dots.mocr	3B	0.031	0.029	9.54

Note: - Metrics are sourced from OmniDocBench and other model publications. pdf-parse-bench results are reproduced by Qwen3-VL-235B-A22B-Instruct. - Formula and Table metrics for OmniDocBench1.5 are omitted due to their high sensitivity to detection and matching protocols.

2. Structured Graphics Parsing

Visual languages (e.g., charts, graphics, chemical formulas, logos) encapsulate dense human knowledge. dots.mocr unifies the interpretation of these elements by parsing them directly into SVG code.

Methods	Unisvg	Chartmimic	Design2Code	Genexam	SciGen	ChemDraw
Low-Level	High-Level	Score
OCRVerse	0.632	0.852	0.763	0.799	-	-	-	0.881
Gemini 3 Pro	0.563	0.850	0.735	0.788	0.760	0.756	0.783	0.839
dots.mocr	0.850	0.923	0.894	0.772	0.801	0.664	0.660	0.790
dots.mocr-svg	0.860	0.931	0.902	0.905	0.834	0.8	0.797	0.901

Note: - We use the ISVGEN metric from UniSVG to evaluate the parsing result. For benchmarks that do not natively support image parsing, we use the original images as input, and calculate the ISVGEN score between the rendered output and the original image. - OCRVerse results are derived from various code formats (e.g., SVG, Python), whereas results for Gemini 3 Pro and dots.mocr are based specifically on SVG code. - Due to the capacity constraints of a 3B-parameter VLM, dots.mocr may not excel in all tasks yet like svg. To complement this, we are simultaneously releasing dots.mocr-svg. We plan to further address these limitations in future updates.

3. General Vision Tasks

Model	CharXiv_descriptive	CharXiv_reasoning	OCR_Reasoning	infovqa	docvqa	ChartQA	OCRBench	AI2D	CountBenchQA	refcoco
Qwen3vl-2b-instruct	62.3	26.8	-

Core symbols most depended-on inside this repo

export_selected_rids

called by 11

demo/demo_gradio_batch.py

_default_ui_state

called by 11

demo/demo_gradio_batch.py

finalize

called by 9

demo/demo_gradio_batch.py

create_temp_session_dir

called by 8

demo/demo_gradio_batch.py

smart_resize

called by 6

dots_ocr/utils/image_utils.py

inference_with_vllm

called by 5

dots_ocr/model/inference.py

ensure_export_ready

called by 5

demo/demo_gradio_batch.py

_edited_filepath

called by 5

demo/demo_gradio_batch.py

Shape

Function 127

Method 40

Class 7

Route 1

Languages

Python100%

Modules by API surface

demo/demo_gradio_batch.py80 symbols

dots_ocr/utils/output_cleaner.py18 symbols

demo/demo_gradio.py16 symbols

dots_ocr/parser.py12 symbols

demo/demo_gradio_annotion.py10 symbols

dots_ocr/utils/image_utils.py9 symbols

dots_ocr/utils/format_transformer.py7 symbols

dots_ocr/utils/layout_utils.py5 symbols

demo/demo_streamlit.py5 symbols

dots_ocr/utils/doc_utils.py4 symbols

dots_ocr/utils/demo_utils/display.py2 symbols

tools/elo_score_prompt.py1 symbols

Dependencies from manifests, versioned

transformers4.56.1 · 1×

For agents

$ claude mcp add dots.ocr \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact