MCPcopy
hub / github.com/areal-project/AReaL

github.com/areal-project/AReaL @v2.0.0 sqlite

repository ↗ · DeepWiki ↗ · release v2.0.0 ↗
10,046 symbols 43,505 edges 812 files 4,452 documented · 44%
README

AReaL: A Large-Scale Asynchronous Reinforcement Learning System

| Paper | Documentation | 中文文档 | Ask DeepWiki | 🤗 Models & Data | WeChat (微信) Group | gitcgr

ReaL

AReaL is a reinforcement learning (RL) infrastructure designed to bridge foundation model training with modern agent-based applications. It was originally developed by researchers and engineers from Tsinghua IIIS and the AReaL Team at Ant Group.

Built on a fully asynchronous RL training paradigm, AReaL is optimized for efficiency and scalability, making it particularly well-suited for training large-scale reasoning and agentic models.

AReaL’s mission is to make building AI agents accessible, efficient, and cost-effective for a broad community of developers and researchers.

Like milk tea - customizable, scalable, and enjoyable - we hope AReaL brings both flexibility and delight to your AI development experience. Cheers!

AReaL Highlights

  • Flexibility: Seamless customization for agentic RL and online RL training for black-box agent applications by simply replacing the base_url.
  • 📈 Scalability: Stable fully asynchronous RL training with industry-leading speed.
  • Cutting-Edge Performance: State-of-the-art math, coding, search, and customer service agents.

📰 News

[2026/06/17] 🔬 Introducing KPop — bidirectional binary KL divergence token masking. Configured via rejection_sampling.metric=binary_kl. Also adding an IcePop config (importance-ratio-based token masking). Check out gsm8k_kpop.yaml and gsm8k_icepop.yaml to get started!

[2026/04/23] 🚀 We’re excited to release our integration with Scaffoldings for agentic RL training - now live in our examples! Huge shoutout to @narutolhy and @WeiHaocheng for making this happen 🙌. The modular design of the Scaffoldings enables it to achieve a thorough decoupling of agent execution, reward calculation, and trajectory acquisition. This enables developers to reuse existing modules when implementing an agentic RL method, allowing them to focus on their own innovative modules.

[2026/04/18] We are thrilled to announce that AReaL's first Community Biweekly Meeting was successfully held! Thank you to everyone who joined us. Meeting materials are now available here. Our next meeting is scheduled for 2026/05/01 and will also be conducted in Chinese; English-language meetings will be scheduled in the future. We warmly welcome everyone to participate! See Community for more details.

[2026/03/02] We provide a complete example to train your own 🦞 OpenClaw agent by simply replacing the base_url and api_key with AReaL's RL service - no complicated dependencies, no code changes, works with any agentic runtime!

📋 Previous Releases

[2026/02/06] We are delighted to introduce AReaL-SEA, a self-evolving data synthesis engine. Combined with RL training on AReaL, the 235B MoE model surpasses GPT 5 and achieves comparable performance with Gemini 3.0 Pro on $\tau^2$-bench! Check out the paper, model, data, and code.

[2026/01/15] Congrats to our friends at CAMEL-AI for open-sourcing SETA, their terminal agent RL project trained with AReaL! Check out their training workflow and the announcement on X.

[2026/01/01] Happy New Year! Thanks to the outstanding contribution from @HwVanICI, we are excited to officially announce stable support for AReaL training on Ascend NPU devices! The code is actively maintained and continuously updated in the ascend branch. Check out our documentation to get started, and feel free to report any issues!

[2025/08/30] Introducing ASearcher, a state-of-the-art search agent built with AReaL's end-to-end asynchronous RL training. Check out the paper and the open-source repository!

[2025/07/31] (AReaL-lite) We introduce AReaL-lite, a lightweight version of AReaL designed specifically for AI researchers and rapid prototyping. AReaL-lite features an algorithm-first API design that prioritizes ease of use and algorithm development, while natively supporting fully asynchronous agentic RL. With 80% fewer lines of code, AReaL-lite maintains 90% of AReaL's performance and core functionality. Check out our AReaL-lite design documentation and the quickstart guide to begin your journey with AReaL-lite!

[2025/06/03] (v0.3, boba²) We release boba² (double-boba) for fully asynchronous RL training, which achieves 2.77× speedup while delivering comparable or superior training performance compared to synchronous systems. Furthermore, asynchronous RL significantly simplifies multi-turn agentic RL training setup! Check out our v0.3 overview blog and the research paper.

[2025/03/31] (v0.2, boba) Introducing our milestone release—boba! Please call it A-ReaL-boba! This release features significantly faster training with SGLang support and state-of-the-art 7B and 32B models for mathematical reasoning. Check out our v0.2 technical blog.

[2025/02/24] (v0.1) Our initial release includes reproducible results for 1.5B and 7B Large Reasoning Models (LRMs). Check out our v0.1 technical blog.

🚀 Getting Started

First, install the package:

git clone https://github.com/areal-project/AReaL
cd AReaL
pip install uv
# Install flash-attn pre-built wheel first to avoid compiling from source
# (pick the wheel matching your Python version; see https://github.com/mjun0812/flash-attention-prebuild-wheels/releases)
uv pip install "https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.7.16/flash_attn-2.8.3+cu128torch2.9-cp312-cp312-linux_x86_64.whl"
uv sync --extra cuda  # installs training packages + SGLang (default inference backend)
# For vLLM instead: cp pyproject.vllm.toml pyproject.toml && cp uv.vllm.lock uv.lock && uv sync --extra cuda

Our training scripts automatically download the required dataset (openai/gsm8k) and model (Qwen/Qwen2-1.5B-Instruct). To run on a single node:

python3 examples/math/gsm8k_rl.py --config examples/math/gsm8k_grpo.yaml scheduler.type=local

If you prefer to run experiments on a Ray cluster, update paths in the YAML file to point to your shared storage, and run:

python3 examples/math/gsm8k_rl.py --config examples/math/gsm8k_grpo.yaml \
  cluster.n_nodes=2 cluster.n_gpus_per_node=8 \
  cluster.fileroot=/path/to/nfs \
  scheduler.type=ray

For comprehensive setup instructions, see our quickstart guide.

📚 Examples

Math & Reasoning

Task Description Performance
Math GSM8K math reasoning with GRPO, PPO, DAPO, REINFORCE, RLOO, LitePPO, DR-GRPO, GSPO, and more -
Multi-Turn Math Multi-turn math agent with reward discounting across turns Training Curve
LoRA Math Parameter-efficient math training with LoRA (SGLang/vLLM backends) -
Countdown Countdown numbers game with custom rewards Training Curve

Agentic RL

Task Description Performance
General Agent General agentic training with any agentic frameworks Guide
Tau2 Customer Service Customer service agent on Tau2-Bench (retail, airline, telecom) Paper
Search Agent End-to-end search agent with Tongyi-DeepResearch workflow Training Curve
Tool-Integrated Reasoning Multi-turn tool calling during reasoning (Python executor, calculator) Training Curve
OpenAI Agents Integration Integration with OpenAI Agents SDK for agentic workflows -
CAMEL-AI Integration Integration with CAMEL-AI framework for agentic RL -

Vision-Language Models

Task Description Performance
VLM Geometry3K and CLEVR Count 70K visual reasoning with GRPO -
VLM on NPU VLM training on Huawei NPU hardware Benchmark Results

Alignment & Infrastructure

Task Description Performance
RLHF Reward Modeling Bradley-Terry reward modeling on Anthropic HH-RLHF Training Curve
SkyPilot Deployment Cloud deployment with SkyPilot (GCP, AWS, Kubernetes) Screenshots

🔧 Support Matrix

🧠 Algorithms

All RL algorithms support both asynchronous and synchronous versions by setting max_head_offpolicyness=0. See Asynchronous RL Guide.

Algorithm Documentation Paper Configuration
GRPO 📖 Docs 📄 Paper [🔗 GSM8K Example](exampl

Core symbols most depended-on inside this repo

append
called by 988
areal/experimental/engine/archon_runner.py
get
called by 630
areal/utils/timeutil.py
print_rank0
called by 490
tests/experimental/archon/torchrun/dist_utils.py
to
called by 266
areal/utils/data.py
set
called by 241
areal/infra/workflow_context.py
clone
called by 232
examples/scaffolding/core/controller.py
get
called by 225
areal/v2/inference_service/router/state.py
serialize_value
called by 166
areal/infra/rpc/serialization.py

Shape

Method 5,380
Function 3,003
Class 1,327
Route 336

Languages

Python100%
TypeScript1%

Modules by API surface

tests/test_local_scheduler.py139 symbols
areal/utils/perf_tracer.py110 symbols
areal/engine/megatron_engine.py109 symbols
tests/test_sglang_pp_unit.py108 symbols
areal/engine/fsdp_engine.py107 symbols
tests/test_seqpack.py97 symbols
tests/test_rollout_controller.py94 symbols
tests/test_megatron_engine_vlm.py94 symbols
areal/experimental/engine/archon_engine.py94 symbols
areal/api/cli_args.py92 symbols
areal/utils/name_resolve.py90 symbols
areal/api/alloc_mode.py85 symbols

Dependencies from manifests, versioned

anthropic
camel-ai0.2.85a0 · 1×
claude-agent-sdk
datasets3.0.0 · 1×
docker
dotenv0.9.9 · 1×
mistral-common1.11.1 · 1×
openai-agents
peft0.18.1 · 1×
qwen-agent0.0.31 · 1×

For agents

$ claude mcp add AReaL \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact