# NeMo RL: A Scalable and Efficient Post-Training Library
Documentation | Discussions | Contributing
Previous News
[7/25/2025] Release v0.3.0!
[5/14/2025] Reproduce DeepscaleR with NeMo RL!
NeMo RL is an open-source post-training library under the NVIDIA NeMo Framework, designed to streamline and scale reinforcement learning methods for multimodal models (LLMs, VLMs etc.). Designed for flexibility, reproducibility, and scale, NeMo RL enables both small-scale experiments and massive multi-GPU, multi-node deployments for fast experimentation in research and production environments.

What you can expect: - Flexibility with a modular design that allows easy integration and customization. - Efficient resource management using Ray, enabling scalable and flexible deployment across different hardware configurations. - Hackable with native PyTorch-only paths for quick research prototypes. - High performance with Megatron Core, supporting various parallelism techniques for large models and large context lengths. - Seamless integration with Hugging Face for ease of use, allowing users to leverage a wide range of pre-trained models and tools. - Comprehensive documentation that is both detailed and user-friendly, with practical examples.
Please refer to our design documents for more details on the architecture and design philosophy.
NeMo RL supports multiple training backends to accommodate different model sizes and hardware configurations:
The training backend is automatically determined based on your YAML configuration settings. For detailed information on backend selection, configuration, and examples, see the Training Backends documentation.
NeMo RL supports multiple generation/rollout backends to accommodate different model sizes and hardware configurations:
For detailed information on backend selection, configuration, and examples, see the Generation Backends documentation.
✅ Available now | 🔜 Coming in v0.6 - 🔜 Muon Optimizer - Emerging Optimizer support for SFT/RL - 🔜 Megatron Inference - Improved performance for Megatron Inference (avoid weight conversion). - 🔜 SGLang Inference - SGLang rollout support for optimized inference. - 🔜 Improved Native Performance - Improve training time for native PyTorch models. - 🔜 Improved Large MoE Performance - Improve Megatron Core training performance and generation performance. - 🔜 New Models - Qwen3-Next, Nemotron-Super. - 🔜 Expand Algorithms - GDPO, LoRA support for RL(GRPO) and DPO - 🔜 Resiliency - Fault tolerance and auto-scaling support - 🔜 On-Policy Distillation - Multi-teacher and cross tokenizer distillation support - 🔜 Speculative Decoding - Speculative Decoding support for rollout acceleration
Support Matrix
Use this quick start to get going with either the native PyTorch DTensor or Megatron Core training backends.
[!NOTE] Both training backends are independent — you can install and use either one on its own.
For more examples and setup details, continue to the Prerequisites section.
| Native PyTorch (DTensor) | Megatron Core |
|---|---|
Clone and create the environment
Note: If you previously ran without checking out the submodules, you may need to rebuild virtual environments by setting NRL_FORCE_REBUILD_VENVS=true. See Tips and Tricks.
|
|
| Run GRP |