<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/logo.png" width="300"/>
🖥️ Live Demo | 💬 WeChat | 📕 rednote | 🐦 X
dots.ocr Designed for universal accessibility, it possesses the capability to recognize virtually any human script. Beyond achieving state-of-the-art (SOTA) performance in standard multilingual document parsing among models of comparable size, dots.ocr-1.5 excels at converting structured graphics (e.g., charts and diagrams) directly into SVG code, parsing web screens and spotting scene text.
2026.03.19 We have rebranded dots.ocr-1.5 as dots.mocr. For technical details, please refer to our paper. The model weights are available on Hugging Face: dots.mocr and dots.mocr-svg.2025.10.31 🚀 We release dots.ocr.base, foundation VLM focus on OCR tasks, also the base model of dots.ocr. Try it out!2025.07.30 🚀 We release dots.ocr, — a multilingual documents parsing model based on 1.7b llm, with SOTA performance.| models | olmOCR-Bench | OmniDocBench (v1.5) | XDocParse | Average |
|---|---|---|---|---|
| MonkeyOCR-pro-3B | 895.0 | 811.3 | 637.1 | 781.1 |
| GLM-OCR | 884.2 | 972.6 | 820.7 | 892.5 |
| PaddleOCR-VL-1.5 | 897.3 | 997.9 | 866.4 | 920.5 |
| HuanyuanOCR | 997.6 | 1003.9 | 951.1 | 984.2 |
| dots.ocr | 1041.1 | 1027.2 | 1190.3 | 1086.2 |
| dots.mocr | 1104.4 | 1059.0 | 1210.7 | 1124.7 |
| Gemini 3 Pro | 1180.4 | 1128.0 | 1323.7 | 1210.7 |
Notes: - Results for Gemini 3 Pro, PaddleOCR-VL-1.5, and GLM-OCR were obtained via APIs, while HuanyuanOCR results were generated using local inference. - The Elo score evaluation was conducted using Gemini 3 Flash. The prompt can be found at: Elo Score Prompt. These results are consistent with the findings on ocrarena.
| Model | ArXiv | Old scans math | Tables | Old scans | Headers & footers | Multi column | Long tiny text | Base | Overall |
|---|---|---|---|---|---|---|---|---|---|
| Mistral OCR API | 77.2 | 67.5 | 60.6 | 29.3 | 93.6 | 71.3 | 77.1 | 99.4 | 72.0±1.1 |
| Marker 1.10.1 | 83.8 | 66.8 | 72.9 | 33.5 | 86.6 | 80.0 | 85.7 | 99.3 | 76.1±1.1 |
| MinerU 2.5.4* | 76.6 | 54.6 | 84.9 | 33.7 | 96.6 | 78.2 | 83.5 | 93.7 | 75.2±1.1 |
| DeepSeek-OCR | 77.2 | 73.6 | 80.2 | 33.3 | 96.1 | 66.4 | 79.4 | 99.8 | 75.7±1.0 |
| Nanonets-OCR2-3B | 75.4 | 46.1 | 86.8 | 40.9 | 32.1 | 81.9 | 93.0 | 99.6 | 69.5±1.1 |
| PaddleOCR-VL* | 85.7 | 71.0 | 84.1 | 37.8 | 97.0 | 79.9 | 85.7 | 98.5 | 80.0±1.0 |
| Infinity-Parser 7B* | 84.4 | 83.8 | 85.0 | 47.9 | 88.7 | 84.2 | 86.4 | 99.8 | 82.5±? |
| olmOCR v0.4.0 | 83.0 | 82.3 | 84.9 | 47.7 | 96.1 | 83.7 | 81.9 | 99.7 | 82.4±1.1 |
| Chandra OCR 0.1.0* | 82.2 | 80.3 | 88.0 | 50.4 | 90.8 | 81.2 | 92.3 | 99.9 | 83.1±0.9 |
| dots.ocr | 82.1 | 64.2 | 88.3 | 40.9 | 94.1 | 82.4 | 81.2 | 99.5 | 79.1±1.0 |
| dots.mocr | 85.9 | 85.5 | 90.7 | 48.2 | 94.0 | 85.3 | 81.6 | 99.7 | 83.9±0.9 |
Note: - The metrics are from olmocr, and our own internal evaluations. - We delete the Page-header and Page-footer cells in the result markdown.
| Model Type | Methods | Size | OmniDocBench(v1.5) TextEdit↓ | OmniDocBench(v1.5) Read OrderEdit↓ | pdf-parse-bench |
|---|---|---|---|---|---|
| GeneralVLMs | Gemini-2.5 Pro | - | 0.075 | 0.097 | 9.06 |
| Qwen3-VL-235B-A22B-Instruct | 235B | 0.069 | 0.068 | 9.71 | |
| gemini3pro | - | 0.066 | 0.079 | 9.68 | |
| SpecializedVLMs | Mistral OCR | - | 0.164 | 0.144 | 8.84 |
| Deepseek-OCR | 3B | 0.073 | 0.086 | 8.26 | |
| MonkeyOCR-3B | 3B | 0.075 | 0.129 | 9.27 | |
| OCRVerse | 4B | 0.058 | 0.071 | -- | |
| MonkeyOCR-pro-3B | 3B | 0.075 | 0.128 | - | |
| MinerU2.5 | 1.2B | 0.047 | 0.044 | - | |
| PaddleOCR-VL | 0.9B | 0.035 | 0.043 | 9.51 | |
| HunyuanOCR | 0.9B | 0.042 | - | - | |
| PaddleOCR-VL1.5 | 0.9B | 0.035 | 0.042 | - | |
| GLMOCR | 0.9B | 0.04 | 0.043 | - | |
| dots.ocr | 3B | 0.048 | 0.053 | 9.29 | |
| dots.mocr | 3B | 0.031 | 0.029 | 9.54 |
Note: - Metrics are sourced from OmniDocBench and other model publications. pdf-parse-bench results are reproduced by Qwen3-VL-235B-A22B-Instruct. - Formula and Table metrics for OmniDocBench1.5 are omitted due to their high sensitivity to detection and matching protocols.
Visual languages (e.g., charts, graphics, chemical formulas, logos) encapsulate dense human knowledge. dots.mocr unifies the interpretation of these elements by parsing them directly into SVG code.
| Methods | Unisvg | Chartmimic | Design2Code | Genexam | SciGen | ChemDraw | ||
|---|---|---|---|---|---|---|---|---|
| Low-Level | High-Level | Score | ||||||
| OCRVerse | 0.632 | 0.852 | 0.763 | 0.799 | - | - | - | 0.881 |
| Gemini 3 Pro | 0.563 | 0.850 | 0.735 | 0.788 | 0.760 | 0.756 | 0.783 | 0.839 |
| dots.mocr | 0.850 | 0.923 | 0.894 | 0.772 | 0.801 | 0.664 | 0.660 | 0.790 |
| dots.mocr-svg | 0.860 | 0.931 | 0.902 | 0.905 | 0.834 | 0.8 | 0.797 | 0.901 |
Note: - We use the ISVGEN metric from UniSVG to evaluate the parsing result. For benchmarks that do not natively support image parsing, we use the original images as input, and calculate the ISVGEN score between the rendered output and the original image. - OCRVerse results are derived from various code formats (e.g., SVG, Python), whereas results for Gemini 3 Pro and dots.mocr are based specifically on SVG code. - Due to the capacity constraints of a 3B-parameter VLM, dots.mocr may not excel in all tasks yet like svg. To complement this, we are simultaneously releasing dots.mocr-svg. We plan to further address these limitations in future updates.
| Model | CharXiv_descriptive | CharXiv_reasoning | OCR_Reasoning | infovqa | docvqa | ChartQA | OCRBench | AI2D | CountBenchQA | refcoco |
|---|---|---|---|---|---|---|---|---|---|---|
| Qwen3vl-2b-instruct | 62.3 | 26.8 | - |
$ claude mcp add dots.ocr \
-- python -m otcore.mcp_server <graph>