hub / github.com/LeapLabTHU/Absolute-Zero-Reasoner

github.com/LeapLabTHU/Absolute-Zero-Reasoner @main sqlite

955 symbols 3,852 edges 175 files 130 documented · 14%

README

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

<a href="#news" style="text-decoration: none; font-weight: bold;">🎉 News</a> •
<a href="#links" style="text-decoration: none; font-weight: bold;">🔗 Links</a> •
<a href="#todo" style="text-decoration: none; font-weight: bold;">📝 Roadmap</a> •
<a href="#algorithm-flow" style="text-decoration: none; font-weight: bold;">⚙️ Algorithm Flow</a> •
<a href="#results" style="text-decoration: none; font-weight: bold;">📊 Results</a>






<a href="#getting-started" style="text-decoration: none; font-weight: bold;">✨ Getting Started</a> •
<a href="#training" style="text-decoration: none; font-weight: bold;">🏋️ Training</a> •
<a href="#usage" style="text-decoration: none; font-weight: bold;">🔧 Usage</a> •
<a href="#evaluation-code" style="text-decoration: none; font-weight: bold;">📃 Evaluation</a>






<a href="#citation" style="text-decoration: none; font-weight: bold;">🎈 Citation</a> •
<a href="#acknowledgement" style="text-decoration: none; font-weight: bold;">🌻 Acknowledgement</a> •
<a href="#contact" style="text-decoration: none; font-weight: bold;">📧 Contact</a> •
<a href="#star-history" style="text-decoration: none; font-weight: bold;">📈 Star History</a>

Absolute Zero Paradigm

⚠️WARNING⚠️: New Qwen3 base models have untrained token embeddings, we used python absolute_zero_reasoner/utils/remove_think_qwen3_tokenizer.py --model_name <Qwen3ModelName> to remove these tokens or else the model produces nonsense.

🚧UNDER TESTING🚧 This new merge to main is still under testing. Use the paper branch to replicate results from original paper.

[2025/06/30] We now support Sandbox-Fusion as executor, just put azr.executor=sandboxfusion in training configs. Officially completed our initial roadmap.
[2025/06/28] We now support new version of veRL, use the paper branch to reproduce the paper results with static copy of veRL. The main branch will now be regularly updated with the latest veRL versions.
[2025/06/01] We release code for evals
[2025/05/06] We present the Absolute Zero Reasoner [Project Page | Paper | Code | Model(s) | Logs].

🔗 Links

🏠 [Project Page]
📜 [Paper]
🤗 [Models]
💻 [Code]
📁 [Logs]

📝 Roadmap

✅ Release training code

✅ Release evaluation code

✅ Update veRL

✅ Upgrade Python executor

⚙️ Algorithm Flow

Our approach centers on a repeated iterative process of the following two steps:

PROPOSE: The model generates reasoning tasks from abduction, deduction, and induction types. Tasks are validated with Python execution and assigned a learnability reward.
SOLVE: The model then attempts to solve these self-generated tasks. Solutions are verified through Python execution, receiving an accuracy reward.

The model continuously improves through both phases using TRR++, creating a self-evolving loop that strengthens reasoning without external training data.

Absolute Zero Reasoner

📊 Results

Main Results

Our approach achieves strong performance across both code and math reasoning benchmarks without using any external data:

Model	Base	#data	Code Avg	Math Avg	Total Avg
Base Models
Qwen2.5-7B	-	-	52.0	27.5	39.8
Qwen2.5-7B-Ins	-	-	56.3	37.0	46.7
Qwen2.5-7B-Coder	-	-	56.6	23.9	40.2
Reasoners Trained on Curated Code Data
AceCoder-RM	Ins	22k	58.3	37.4	47.9
AceCoder-RM	Coder	22k	57.3	27.5	42.4
AceCoder-Rule	Ins	22k	55.4	36.9	46.2
AceCoder-Rule	Coder	22k	60.0	28.5	44.3
CodeR1-LC2k	Ins	2k	60.5	35.6	48.0
CodeR1-12k	Ins	10k	61.3	33.5	47.4
Reasoners Trained on Curated Math Data
PRIME-Zero	Coder	484k	37.2	45.8	41.5
SimpleRL-Zoo	Base	8.5k	54.0	38.5	46.3
Oat-Zero	Math	8.5k	45.4	44.3	44.9
ORZ	Base	57k	55.6	41.6	48.6
Absolute Zero Training w/ No Curated Data (Ours)
AZR (Ours)	Base	0	55.2 +3.2	38.4 +10.9	46.8 +7.0
AZR (Ours)	Coder	0	61.6 +5.0	39.1 +15.2	50.4 +10.2

Scaling Results

AZR shows consistent improvements across model sizes and types:

Model Family	Variant	Code Avg	Math Avg	Total Avg
Llama3.1-8b		28.5	3.4	16.0
Llama3.1-8b	+ AZR (Ours)	31.6 +3.1	6.8 +3.4	19.2 +3.2
Qwen2.5-3B Coder		51.2	18.8	35.0
Qwen2.5-3B Coder	+ AZR (Ours)	54.9 +3.7	26.5 +7.7	40.7 +5.7
Qwen2.5-7B Coder		56.6	23.9	40.2
Qwen2.5-7B Coder	+ AZR (Ours)	61.6 +5.0	39.1 +15.2	50.4 +10.2
Qwen2.5-14B Coder		60.0	20.2	40.1
Qwen2.5-14B Coder	+ AZR (Ours)	63.6 +3.6	43.0 +22.8	53.3 +13.2

✨ Getting Started

🎄 Environment Setup

conda env create -f azr_env.yml
conda activate azr
pip install -r flashattn_requirements.txt

💾 Data Processing

Process evaluation data on CruxEval / LiveCodeBench Execution during AZR Self-play

python -m absolute_zero_reasoner.data_construction.process_code_reasoning_data

🏋️ Training

⚠️WARNING⚠️: The Python executor in this repository is very raw and intended for research purposes only. It is not secure for production environments. We plan to update our executor to more secure implementations in the future. Your use of our code i

Core symbols most depended-on inside this repo

status

called by 100

absolute_zero_reasoner/utils/logging_utils/stdout.py

check_id

called by 39

evaluation/code_eval/coding/evalplus/tools/mbpp/fix_v010.py

section_header

called by 33

absolute_zero_reasoner/utils/logging_utils/stdout.py

get_human_eval_plus

called by 28

evaluation/code_eval/coding/evalplus/evalplus/data/humaneval.py

_style

called by 26

absolute_zero_reasoner/utils/logging_utils/stdout.py

readlines

called by 22

evaluation/code_eval/coding/evalplus/evalplus/eval/utils.py

get_mbpp_plus

called by 17

evaluation/code_eval/coding/evalplus/evalplus/data/mbpp.py

start

called by 16

absolute_zero_reasoner/utils/code_utils/sandboxfusion_executor.py

Shape

Function 565

Method 296

Class 85

Route 9

Languages

Python100%

Modules by API surface

absolute_zero_reasoner/trainer/ppo/azr_ray_trainer.py44 symbols

evaluation/code_eval/coding/evalplus/tools/_experimental/type_mut_for_eff.py29 symbols

absolute_zero_reasoner/utils/code_utils/python_executor.py28 symbols

absolute_zero_reasoner/rewards/math_utils.py27 symbols

evaluation/math_eval/eval/custom_evaluate.py21 symbols

absolute_zero_reasoner/utils/code_utils/sandboxfusion_executor.py21 symbols

evaluation/code_eval/coding/LiveCodeBench/lcb_runner/evaluation/testing_util.py20 symbols

evaluation/code_eval/coding/evalplus/evalplus/gen/type_mut.py19 symbols

absolute_zero_reasoner/utils/code_utils/parsers.py19 symbols

evaluation/math_eval/eval/sh/collect_results.py18 symbols

evaluation/math_eval/eval/python_executor.py18 symbols

evaluation/math_eval/eval/parser.py17 symbols

Dependencies from manifests, versioned

GitPython3.1.44 · 1×

Jinja23.1.6 · 1×

MarkupSafe3.0.2 · 1×

Pebble5.1.0 · 1×

PyYAML6.0.2 · 1×

accelerate1.4.0 · 1×

aiohappyeyeballs2.5.0 · 1×

aiohttp3.11.13 · 1×

aiosignal1.3.2 · 1×

airportsdata20250224 · 1×

annotated-types0.7.0 · 1×

anthropic0.49.0 · 1×

For agents

$ claude mcp add Absolute-Zero-Reasoner \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact