MCPcopy
hub / github.com/LeapLabTHU/Absolute-Zero-Reasoner

github.com/LeapLabTHU/Absolute-Zero-Reasoner @main sqlite

repository ↗ · DeepWiki ↗
955 symbols 3,852 edges 175 files 130 documented · 14%
README

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper Project Page Github Hugging Face Collection W&B Logs

<a href="#news" style="text-decoration: none; font-weight: bold;">🎉 News</a> •
<a href="#links" style="text-decoration: none; font-weight: bold;">🔗 Links</a> •
<a href="#todo" style="text-decoration: none; font-weight: bold;">📝 Roadmap</a> •
<a href="#algorithm-flow" style="text-decoration: none; font-weight: bold;">⚙️ Algorithm Flow</a> •
<a href="#results" style="text-decoration: none; font-weight: bold;">📊 Results</a>






<a href="#getting-started" style="text-decoration: none; font-weight: bold;">✨ Getting Started</a> •
<a href="#training" style="text-decoration: none; font-weight: bold;">🏋️ Training</a> •
<a href="#usage" style="text-decoration: none; font-weight: bold;">🔧 Usage</a> •
<a href="#evaluation-code" style="text-decoration: none; font-weight: bold;">📃 Evaluation</a>






<a href="#citation" style="text-decoration: none; font-weight: bold;">🎈 Citation</a> •
<a href="#acknowledgement" style="text-decoration: none; font-weight: bold;">🌻 Acknowledgement</a> •
<a href="#contact" style="text-decoration: none; font-weight: bold;">📧 Contact</a> •
<a href="#star-history" style="text-decoration: none; font-weight: bold;">📈 Star History</a>

Absolute Zero Paradigm

⚠️WARNING⚠️: New Qwen3 base models have untrained token embeddings, we used python absolute_zero_reasoner/utils/remove_think_qwen3_tokenizer.py --model_name <Qwen3ModelName> to remove these tokens or else the model produces nonsense.

🚧UNDER TESTING🚧 This new merge to main is still under testing. Use the paper branch to replicate results from original paper.

  • [2025/06/30] We now support Sandbox-Fusion as executor, just put azr.executor=sandboxfusion in training configs. Officially completed our initial roadmap.
  • [2025/06/28] We now support new version of veRL, use the paper branch to reproduce the paper results with static copy of veRL. The main branch will now be regularly updated with the latest veRL versions.
  • [2025/06/01] We release code for evals
  • [2025/05/06] We present the Absolute Zero Reasoner [Project Page | Paper | Code | Model(s) | Logs].

🔗 Links


📝 Roadmap


Release training code

Release evaluation code

Update veRL

Upgrade Python executor

⚙️ Algorithm Flow


Our approach centers on a repeated iterative process of the following two steps:

  1. PROPOSE: The model generates reasoning tasks from abduction, deduction, and induction types. Tasks are validated with Python execution and assigned a learnability reward.

  2. SOLVE: The model then attempts to solve these self-generated tasks. Solutions are verified through Python execution, receiving an accuracy reward.

The model continuously improves through both phases using TRR++, creating a self-evolving loop that strengthens reasoning without external training data.

Absolute Zero Reasoner

📊 Results


Main Results

Our approach achieves strong performance across both code and math reasoning benchmarks without using any external data:

Model Base #data Code Avg Math Avg Total Avg
Base Models
Qwen2.5-7B - - 52.0 27.5 39.8
Qwen2.5-7B-Ins - - 56.3 37.0 46.7
Qwen2.5-7B-Coder - - 56.6 23.9 40.2
Reasoners Trained on Curated Code Data
AceCoder-RM Ins 22k 58.3 37.4 47.9
AceCoder-RM Coder 22k 57.3 27.5 42.4
AceCoder-Rule Ins 22k 55.4 36.9 46.2
AceCoder-Rule Coder 22k 60.0 28.5 44.3
CodeR1-LC2k Ins 2k 60.5 35.6 48.0
CodeR1-12k Ins 10k 61.3 33.5 47.4
Reasoners Trained on Curated Math Data
PRIME-Zero Coder 484k 37.2 45.8 41.5
SimpleRL-Zoo Base 8.5k 54.0 38.5 46.3
Oat-Zero Math 8.5k 45.4 44.3 44.9
ORZ Base 57k 55.6 41.6 48.6
Absolute Zero Training w/ No Curated Data (Ours)
AZR (Ours) Base 0 55.2 +3.2 38.4 +10.9 46.8 +7.0
AZR (Ours) Coder 0 61.6 +5.0 39.1 +15.2 50.4 +10.2

Scaling Results

AZR shows consistent improvements across model sizes and types:

Model Family Variant Code Avg Math Avg Total Avg
Llama3.1-8b 28.5 3.4 16.0
Llama3.1-8b + AZR (Ours) 31.6 +3.1 6.8 +3.4 19.2 +3.2
Qwen2.5-3B Coder 51.2 18.8 35.0
Qwen2.5-3B Coder + AZR (Ours) 54.9 +3.7 26.5 +7.7 40.7 +5.7
Qwen2.5-7B Coder 56.6 23.9 40.2
Qwen2.5-7B Coder + AZR (Ours) 61.6 +5.0 39.1 +15.2 50.4 +10.2
Qwen2.5-14B Coder 60.0 20.2 40.1
Qwen2.5-14B Coder + AZR (Ours) 63.6 +3.6 43.0 +22.8 53.3 +13.2

✨ Getting Started


🎄 Environment Setup

conda env create -f azr_env.yml
conda activate azr
pip install -r flashattn_requirements.txt

💾 Data Processing

Process evaluation data on CruxEval / LiveCodeBench Execution during AZR Self-play

python -m absolute_zero_reasoner.data_construction.process_code_reasoning_data

🏋️ Training


⚠️WARNING⚠️: The Python executor in this repository is very raw and intended for research purposes only. It is not secure for production environments. We plan to update our executor to more secure implementations in the future. Your use of our code i

Core symbols most depended-on inside this repo

status
called by 100
absolute_zero_reasoner/utils/logging_utils/stdout.py
check_id
called by 39
evaluation/code_eval/coding/evalplus/tools/mbpp/fix_v010.py
section_header
called by 33
absolute_zero_reasoner/utils/logging_utils/stdout.py
get_human_eval_plus
called by 28
evaluation/code_eval/coding/evalplus/evalplus/data/humaneval.py
_style
called by 26
absolute_zero_reasoner/utils/logging_utils/stdout.py
readlines
called by 22
evaluation/code_eval/coding/evalplus/evalplus/eval/utils.py
get_mbpp_plus
called by 17
evaluation/code_eval/coding/evalplus/evalplus/data/mbpp.py
start
called by 16
absolute_zero_reasoner/utils/code_utils/sandboxfusion_executor.py

Shape

Function 565
Method 296
Class 85
Route 9

Languages

Python100%

Modules by API surface

absolute_zero_reasoner/trainer/ppo/azr_ray_trainer.py44 symbols
evaluation/code_eval/coding/evalplus/tools/_experimental/type_mut_for_eff.py29 symbols
absolute_zero_reasoner/utils/code_utils/python_executor.py28 symbols
absolute_zero_reasoner/rewards/math_utils.py27 symbols
evaluation/math_eval/eval/custom_evaluate.py21 symbols
absolute_zero_reasoner/utils/code_utils/sandboxfusion_executor.py21 symbols
evaluation/code_eval/coding/LiveCodeBench/lcb_runner/evaluation/testing_util.py20 symbols
evaluation/code_eval/coding/evalplus/evalplus/gen/type_mut.py19 symbols
absolute_zero_reasoner/utils/code_utils/parsers.py19 symbols
evaluation/math_eval/eval/sh/collect_results.py18 symbols
evaluation/math_eval/eval/python_executor.py18 symbols
evaluation/math_eval/eval/parser.py17 symbols

Dependencies from manifests, versioned

GitPython3.1.44 · 1×
Jinja23.1.6 · 1×
MarkupSafe3.0.2 · 1×
Pebble5.1.0 · 1×
PyYAML6.0.2 · 1×
accelerate1.4.0 · 1×
aiohappyeyeballs2.5.0 · 1×
aiohttp3.11.13 · 1×
aiosignal1.3.2 · 1×
airportsdata20250224 · 1×
annotated-types0.7.0 · 1×
anthropic0.49.0 · 1×

For agents

$ claude mcp add Absolute-Zero-Reasoner \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact