

Dolphin-v2 is an enhanced universal document parsing model that substantially improves upon the original Dolphin. It seamlessly handles any document type—whether digital-born or photographed—through a document-type-aware two-stage architecture with scalable anchor prompting.
Document image parsing is challenging due to diverse document types and complexly intertwined elements such as text paragraphs, figures, formulas, tables, and code blocks. Dolphin-v2 addresses these challenges through a document-type-aware two-stage approach:

Dolphin achieves promising performance across diverse page-level and element-level parsing tasks while ensuring superior efficiency through its lightweight architecture and parallel parsing mechanism.
| Model | Size | Overall↑ | TextEdit↓ | FormulaCDM↑ | TableTEDS↑ | TableTEDS-S↑ | Read OrderEdit↓ |
|---|---|---|---|---|---|---|---|
| Dolphin | 0.3B | 74.67 | 0.125 | 67.85 | 68.70 | 77.77 | 0.124 |
| Dolphin-1.5 | 0.3B | 85.06 | 0.085 | 79.44 | 84.25 | 88.06 | 0.071 |
| Dolphin-v2 | 3B | 89.78 | 0.054 | 87.63 | 87.02 | 90.48 | 0.054 |
Clone the repository:
bash
git clone https://github.com/ByteDance/Dolphin.git
cd Dolphin
Install the dependencies:
bash
pip install -r requirements.txt
Download the pre-trained models of Dolphin-v2:
Visit our Huggingface model card, or download model by:
bash
# Download the model from Hugging Face Hub
git lfs install
git clone https://huggingface.co/ByteDance/Dolphin-v2 ./hf_model
# Or use the Hugging Face CLI
pip install huggingface_hub
huggingface-cli download ByteDance/Dolphin-v2 --local-dir ./hf_model
Dolphin provides two inference frameworks with support for two parsing granularities: - Page-level Parsing: Parse the entire document page into a structured JSON and Markdown format - Element-level Parsing: Parse individual document elements (text, table, formula)
# Process a single document image
python demo_page.py --model_path ./hf_model --save_dir ./results \
--input_path ./demo/page_imgs/page_1.png
# Process a single document pdf
python demo_page.py --model_path ./hf_model --save_dir ./results \
--input_path ./demo/page_imgs/page_6.pdf
# Process all documents in a directory
python demo_page.py --model_path ./hf_model --save_dir ./results \
--input_path ./demo/page_imgs
# Process with custom batch size for parallel element decoding
python demo_page.py --model_path ./hf_model --save_dir ./results \
--input_path ./demo/page_imgs \
--max_batch_size 8
# Process element images (specify element_type: table, formula, text, or code)
python demo_element.py --model_path ./hf_model --save_dir ./results \
--input_path \
--element_type [table|formula|text|code]
# Process a single document image
python demo_layout.py --model_path ./hf_model --save_dir ./results \
--input_path ./demo/page_imgs/page_1.png \
# Process a single PDF document
python demo_layout.py --model_path ./hf_model --save_dir ./results \
--input_path ./demo/page_imgs/page_6.pdf \
# Process all documents in a directory
python demo_layout.py --model_path ./hf_model --save_dir ./results \
--input_path ./demo/page_imgs
Call for Bad Cases: If you have encountered any cases where the model performs poorly, we would greatly appreciate it if you could share them in the issue. We are continuously working to optimize and improve the model.
We would like to acknowledge the following open-source projects that provided inspiration and reference for this work: - OmniDocBench - Donut - Nougat - GOT - MinerU - Swin - Hugging Face Transformers
If you find this code useful for your research, please use the following BibTeX entry.
@article{feng2025dolphin,
title={Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting},
author={Feng, Hao and Wei, Shu and Fei, Xiang and Shi, Wei and Han, Yingdong and Liao, Lei and Lu, Jinghui and Wu, Binghong and Liu, Qi and Lin, Chunhui and others},
journal={arXiv preprint arXiv:2505.14059},
year={2025}
}
$ claude mcp add Dolphin \
-- python -m otcore.mcp_server <graph>