DeepSpec is a full-stack codebase for training and evaluating draft models for speculative decoding. It contains data preparation utilities, draft model implementations, training code, and evaluation scripts.
Install the Python dependencies:
python -m pip install -r requirements.txt
Data preparation additionally requires an inference engine to serve the target model when regenerating answers; see scripts/data/README.md for details.
Run the stages in order — each stage's output feeds the next:
See scripts/data/README.md for the step-by-step data pipeline:
Qwen/Qwen3-4B setting).bash scripts/train/train.sh
train.sh launches train.py, which spawns one worker per visible GPU. Select the algorithm and target model by pointing config_path at one of the configs under config/ (e.g. config/dspark/dspark_qwen3_4b.py); see the script header for the full list of configs, how to override config_path / target_cache_dir, and how to use --opts to override individual config fields. Checkpoints are written to ~/checkpoints/<project_name>/<exp_name>/step_*.
Hardware: the default configs and scripts assume a single node with 8 GPUs. For fewer GPUs, reduce CUDA_VISIBLE_DEVICES.
bash scripts/eval/eval.sh
eval.sh runs eval.py against a trained draft checkpoint over the speculative-decoding benchmarks in eval_datasets/ (gsm8k, math500, aime25, humaneval, mbpp, livecodebench, mt-bench, alpaca, arena-hard-v2). Set:
target_name_or_path — the target model the draft was trained against (e.g. Qwen/Qwen3-4B),draft_name_or_path — the draft checkpoint, e.g. ~/checkpoints/deepspec/dspark_block7_qwen3_4b/step_latest, or one of the Hugging Face repo IDs listed in Released Checkpoints.The checkpoints below are the ones used for Table 1 in the paper. Each checkpoint was trained on open-perfectblend data generated by its corresponding target model in non-thinking mode, and is the direct output of the corresponding training configuration under config/.
| Algorithm | Qwen/Qwen3-4B |
Qwen/Qwen3-8B |
Qwen/Qwen3-14B |
google/gemma-4-12B-it |
|---|---|---|---|---|
| Eagle3 | deepseek-ai/eagle3_qwen3_4b_ttt7 | deepseek-ai/eagle3_qwen3_8b_ttt7 | deepseek-ai/eagle3_qwen3_14b_ttt7 | deepseek-ai/eagle3_gemma4_12b_ttt7 |
| DFlash | deepseek-ai/dflash_qwen3_4b_block7 | deepseek-ai/dflash_qwen3_8b_block7 | deepseek-ai/dflash_qwen3_14b_block7 | deepseek-ai/dflash_gemma4_12b_block7 |
| DSpark | deepseek-ai/dspark_qwen3_4b_block7 | deepseek-ai/dspark_qwen3_8b_block7 | deepseek-ai/dspark_qwen3_14b_block7 | deepseek-ai/dspark_gemma4_12b_block7 |
[!IMPORTANT] If you cite these results in a new paper, align your setup with the training settings in this repository; otherwise, the comparison is not meaningful. For domain-specific use, fine-tune the draft model again for better results, especially if the target model is expected to run in thinking mode.
Currently, DeepSpec includes three draft models: DSpark, DFlash and Eagle3.
DeepSpec is released under the MIT License. It includes code adapted from third-party projects under their own licenses; see NOTICE for the full attribution.
DeepSpec builds on the ideas and code of several excellent open-source projects:
We thank the authors and maintainers of these projects. Contributions of new algorithms are welcome.
$ claude mcp add DeepSpec \
-- python -m otcore.mcp_server <graph>