MCPcopy Index your code
hub / github.com/PRIME-RL/SimpleVLA-RL

github.com/PRIME-RL/SimpleVLA-RL @main sqlite

repository ↗ · DeepWiki ↗
2,071 symbols 7,259 edges 230 files 529 documented · 26%
README

SimpleVLA-RL: Open RL Framework for Vision–Language–Action Models

Paper Github Hugging Face Collection Twitter WeChat

SimpleVLA-RL is an efficient RL framework for VLA that improves long-horizon planning under data scarcity. It leverages reinforcement learning that can substantially outperforms SFT in simulation and real-world tasks, reveals a "pushcut" new-action phenomenon, and strengthens spatial/object/goal generalization.

🎉News

  • [2026-01-01] Building upon $${\color{red}\textbf{SimpleVLA-RL}}$$, we have implemented $${\color{red}\textbf{real-world RL }}$$ on long-horizon dexterous tasksand witnessed a $${\color{red}\textbf{non-trivial}}$$ (~relatively 300\%) performance improvement over the SFT model, along with surprising capabilities on auto-recovery. Blog coming soon.

https://github.com/user-attachments/assets/45fca289-39d4-4a42-8014-1ef7eff2d806

  • [2025-10-01] SimpleVLA-RL now supports RoboTwin2.0 Benchmark. Feel free to experiment with it!
  • [2025-09-12] Excited to release the SimpleVLA-RL paper! Check it out: Paper.
  • [2025-05-27] We release the code of SimpleVLA-RL.

📌Highlights

Efficient and Effective VLA Reinforcement Learning Framework

  • End-to-end VLA RL pipeline built on veRL with VLA-specific optimizations
  • Multi-environment parallel rendering significantly accelerates VLA trajectory sampling
  • Leverages veRL's state-of-the-art infrastructure: efficient distributed training (FSDP), hybrid communication patterns, and optimized memory management for fast training/inference

Model and Environment Support

Minimal Reward Engineering and Exploration Strategies

  • Binary (0/1) outcome rewards - no complex reward design needed
  • Exploration strategies: dynamic sampling, adaptive clipping, temperature tuning

🔧Key Implementations

SimpleVLA-RL extends veRL with VLA-specific components across the following modules:

verl/trainer/main_ppo.py - Main entry point with ray initialization - RobRewardManager for reward distribution

verl/trainer/ppo/ray_trainer.py - Main RL training loop: data loading, VLA rollout, model updates, evaluation, checkpointing - RL algorithm-specific advantage computation

verl/workers/fsdp_workers.py - Source of core functions called in ray_trainer.py - VLA model/optimizer initialization, generate_sequences, compute_entropy, update_actor

verl/workers/actor/dp_rob.py - Specific implementation of functions in fsdp_workers.py - RL loss computation, policy updates, compute_log_prob, compute_entropy

verl/workers/rollout/rob_rollout.py - VLA rollout implementation: environment creation, multi-environment parallel rendering, VLA action generation, environment interaction, video saving, trajectory and 0/1 reward collection

verl/utils/dataset/rob_dataset.py - Dataset construction for training/testing across benchmarks

verl/utils/vla_utils/ - VLA model implementations (OpenVLA-OFT/OpenVLA from official code)

✨Getting Started

1. Set Up the Environment

See SETUP.md for detailed instructions on setting up the conda environment.

2. Prepare the SFT Model

An SFT (Supervised Fine-Tuning) VLA model is required for RL training. Below are the available options:

  • OpenVLA-OFT SFT Models
    Download from the SimpleVLA-RL Collection. Available models include:
  • libero-10 traj1/trajall SFT
  • libero-goal/object/spatial traj1 SFT
  • Robotwin2.0 tasks traj1000 SFT
  • OpenVLA SFT Models
    Download from here.

  • Other Models
    For other models, you may need to fine-tune them yourself.

3. Train with SimpleVLA-RL

Before running the training script, ensure the following configurations are properly set:

  • Set Your Weights and Biases (WandB) API Key
    Replace the WANDB_API_KEY field in SimpleVLA-RL/align.json with your own WandB API key.

  • Modify Key Variables
    Update the following variables in examples/run_openvla_oft_rl_libero/twin2.sh as needed:

  • WANDB_API_KEY: Your WandB API key.
  • EXPERIMENT_NAME: The name of your experiment. You can choose any name.
  • SFT_MODEL_PATH: Path to your SFT model.
  • CKPT_PATH: Path where your checkpoints will be saved.
  • DATASET_NAME: For detailed options, refer to examples/run_openvla_oft_rl_libero/twin2.sh.
  • ALIGN_PATH: Path to the SimpleVLA-RL/align.json file.
  • NUM_GPUS: Number of GPUs available per node (e.g., 8).
  • NUM_NODES: Number of nodes used for RL training (e.g., 1).

[!NOTE]

  • The script has been tested on the following configurations:
  • Single-node setup: NUM_NODES=1, NUM_GPUS=8 (1 node with 8 NVIDIA A800 GPUs, each having 80GB memory).
  • Multi-node setup: NUM_NODES=2, NUM_GPUS=8 (2 nodes with 16 NVIDIA A800 GPUs, each having 80GB memory).
  • The driver version used is 470.161.03, and the CUDA version is 12.4. (Not necessary)
  • Run RL Training
    Use the following command to start RL training for OpenVLA-OFT on the LIBERO or RoboTwin2.0 benchmark:

bash bash examples/run_openvla_oft_rl_libero.sh or bash examples/run_openvla_oft_rl_twin2.sh

4. Run Evaluation

To evaluate the performance of your model, enable evaluation mode by setting trainer.val_only=True in examples/run_openvla_oft_rl_libero/twin2.sh. Then, execute the same script:

bash examples/run_openvla_oft_rl_libero.sh
or
bash examples/run_openvla_oft_rl_twin2.sh

📃 Main Results

We evaluate SimpleVLA-RL on the LIBERO using OpenVLA-OFT. SimpleVLA-RL improves the performance of OpenVLA-OFT to 97.6 points on LIBERO-Long and sets a new state-of-the-art. Remarkably, using only one trajectory per task for cold-start SFT, SimpleVLA-RL raises the performance of OpenVLA-OFT from 17.3 to 91.7, yielding an improvement of 74.4 points (430.1%).

Main Results of SimpleVLA-RL.

Overview of SimpleVLA-RL.

🌻Acknowledgement

We develop this preview version of the code based on veRL, OpenVLA-OFT, RoboTwin2.0, and PRIME. We acknowledge their significant contributions! For further details and updates, please refer to the official documentation and repositories of the respective projects.

📝Roadmap

Expanding Model Support

  • [ ] Support advanced diffusion based RL: pi0 and pi0.5 with flow matching RL
  • [ ] Support more VLA models: especially for lightweight ones (e.g. VLA-Adapter, SmolVLA)

Expanding Environment Support

Expanding Framework

  • [ ] Additional online RL methods and Offline RL algorithms
  • [ ] Modular environment and VLA interface for easy adaptation
  • [ ] Further optimize the RL framework to achieve more efficient training

📨Contact

  • Haozhan Li: zhan72426@gmail.com
  • Ning Ding: dingning@mail.tsinghua.edu.cn

🎈Citation

If you find SimpleVLA-RL helpful, please cite us:

@article{li2025simplevla,
  title={SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning},
  author={Li, Haozhan and Zuo, Yuxin and Yu, Jiale and Zhang, Yuhao and Yang, Zhaohui and Zhang, Kaiyan and Zhu, Xuekai and Zhang, Yuchen and Chen, Tianxing and Cui, Ganqu and others},
  journal={arXiv preprint arXiv:2509.09674},
  year={2025}
}

🌟Star History

Star History Chart

Core symbols most depended-on inside this repo

get
called by 217
verl/utils/memory_buffer.py
to
called by 171
verl/protocol.py
get
called by 133
verl/protocol.py
get_pose
called by 108
modified_codes/robotwin2/envs/utils/transforms.py
move
called by 88
modified_codes/robotwin2/envs/_base_task.py
named_parameters
called by 73
verl/utils/memory_buffer.py
log_gpu_memory_usage
called by 56
verl/utils/debug/performance.py
update
called by 44
verl/trainer/ppo/core_algos.py

Shape

Method 1,198
Function 618
Class 232
Route 23

Languages

Python100%

Modules by API surface

modified_codes/robotwin2/envs/_base_task.py66 symbols
verl/utils/vla_utils/openvla_oft/modeling_prismatic.py58 symbols
verl/workers/fsdp_workers.py52 symbols
modified_codes/robotwin2/envs/robot/robot_curobo.py45 symbols
modified_codes/robotwin2/envs/robot/robot.py45 symbols
verl/workers/megatron_workers.py38 symbols
verl/single_controller/ray/base.py38 symbols
verl/models/llama/megatron/modeling_llama_megatron.py35 symbols
verl/utils/vla_utils/openvla/modeling_prismatic.py33 symbols
verl/third_party/vllm/vllm_v_0_3_1/config.py32 symbols
verl/protocol.py32 symbols
verl/workers/hybrid_engine/megatron_vllm.py31 symbols

For agents

$ claude mcp add SimpleVLA-RL \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact