MCPcopy
hub / github.com/NVIDIA-NeMo/RL

github.com/NVIDIA-NeMo/RL @v0.6.0 sqlite

repository ↗ · DeepWiki ↗ · release v0.6.0 ↗
3,643 symbols 15,124 edges 312 files 2,254 documented · 62%
README

# NeMo RL: A Scalable and Efficient Post-Training Library

CICD NeMo RL GitHub Stars

Documentation | Discussions | Contributing

📣 News

Previous News

Overview

NeMo RL is an open-source post-training library under the NVIDIA NeMo Framework, designed to streamline and scale reinforcement learning methods for multimodal models (LLMs, VLMs etc.). Designed for flexibility, reproducibility, and scale, NeMo RL enables both small-scale experiments and massive multi-GPU, multi-node deployments for fast experimentation in research and production environments.

NeMo RL Architecture Diagram

What you can expect: - Flexibility with a modular design that allows easy integration and customization. - Efficient resource management using Ray, enabling scalable and flexible deployment across different hardware configurations. - Hackable with native PyTorch-only paths for quick research prototypes. - High performance with Megatron Core, supporting various parallelism techniques for large models and large context lengths. - Seamless integration with Hugging Face for ease of use, allowing users to leverage a wide range of pre-trained models and tools. - Comprehensive documentation that is both detailed and user-friendly, with practical examples.

Please refer to our design documents for more details on the architecture and design philosophy.

Training Backends

NeMo RL supports multiple training backends to accommodate different model sizes and hardware configurations:

  • DTensor - PyTorch's next-generation distributed training with improved memory efficiency (PyTorch-native TP, SP, PP, CP, and FSDP2).
  • Megatron - NVIDIA's high-performance training framework for scaling to large models with 6D parallelisms.

The training backend is automatically determined based on your YAML configuration settings. For detailed information on backend selection, configuration, and examples, see the Training Backends documentation.

Generation Backends

NeMo RL supports multiple generation/rollout backends to accommodate different model sizes and hardware configurations:

  • vLLM - A high-throughput and memory-efficient popular inference and serving engine.
  • Megatron - A high-performance Megatron-native inference backend which eliminates weight conversion between training and inference.

For detailed information on backend selection, configuration, and examples, see the Generation Backends documentation.

Features

Available now | 🔜 Coming in v0.6 - 🔜 Muon Optimizer - Emerging Optimizer support for SFT/RL - 🔜 Megatron Inference - Improved performance for Megatron Inference (avoid weight conversion). - 🔜 SGLang Inference - SGLang rollout support for optimized inference. - 🔜 Improved Native Performance - Improve training time for native PyTorch models. - 🔜 Improved Large MoE Performance - Improve Megatron Core training performance and generation performance. - 🔜 New Models - Qwen3-Next, Nemotron-Super. - 🔜 Expand Algorithms - GDPO, LoRA support for RL(GRPO) and DPO - 🔜 Resiliency - Fault tolerance and auto-scaling support - 🔜 On-Policy Distillation - Multi-teacher and cross tokenizer distillation support - 🔜 Speculative Decoding - Speculative Decoding support for rollout acceleration

  • Distributed Training - Ray-based infrastructure.
  • Environment Support and Isolation - Support for multi-environment training and dependency isolation between components.
  • Worker Isolation - Process isolation between RL Actors (no worries about global state).
  • Learning Algorithms - GRPO/GSPO/DAPO, SFT(with LoRA), DPO, and On-policy distillation.
  • Multi-Turn RL - Multi-turn generation and training for RL with tool use, games, etc.
  • Advanced Parallelism with DTensor - PyTorch FSDP2, TP, CP, and SP for efficient training (through NeMo AutoModel).
  • Larger Model Support with Longer Sequences - Performant parallelisms with Megatron Core (TP/PP/CP/SP/EP/FSDP) (through NeMo Megatron Bridge).
  • Sequence Packing - Sequence packing in both DTensor and Megatron Core for huge training performance gains.
  • Fast Generation - vLLM backend for optimized inference.
  • Hugging Face Integration - OOB support in the DTensor path, CKPT conversion available for Megatron path through Megatron Bridge middleware.
  • End-to-End FP8 Low-Precision Training - Support for Megatron Core FP8 training and FP8 vLLM generation.
  • Vision Language Models (VLM) - Support SFT and GRPO on VLMs.
  • Megatron Inference - Megatron Inference for fast Day-0 support for new Megatron models (avoid weight conversion).
  • Async RL - Support for asynchronous rollouts and replay buffers for off-policy training, and enable a fully asynchronous GRPO.
  • Nemo-Gym Integration - RL Environment Integration.
  • GB200 - container support for GB200.

Table of Contents

Quick Start

Use this quick start to get going with either the native PyTorch DTensor or Megatron Core training backends.

[!NOTE] Both training backends are independent — you can install and use either one on its own.

For more examples and setup details, continue to the Prerequisites section.

Native PyTorch (DTensor) Megatron Core
Clone and create the environment
git clone git@github.com:NVIDIA-NeMo/RL.git nemo-rl --recursive
cd nemo-rl
uv venv
Note: If you previously ran without checking out the submodules, you may need to rebuild virtual environments by setting NRL_FORCE_REBUILD_VENVS=true. See Tips and Tricks.
Run GRP

Core symbols most depended-on inside this repo

to
called by 164
nemo_rl/data/multimodal_utils.py
time
called by 91
nemo_rl/utils/timer.py
min
called by 86
tests/check_metrics.py
max
called by 82
tests/check_metrics.py
update
called by 69
nemo_rl/data/packing/metrics.py
get_tokenizer
called by 63
nemo_rl/algorithms/utils.py
state_dict
called by 59
nemo_rl/utils/native_checkpoint.py
configure_generation_config
called by 57
nemo_rl/models/generation/__init__.py

Shape

Method 1,629
Function 1,325
Class 473
Route 216

Languages

Python100%

Modules by API surface

tests/unit/models/automodel/test_automodel_setup.py137 symbols
tests/unit/utils/test_logger.py133 symbols
tests/unit/models/automodel/test_automodel_train.py100 symbols
tests/unit/algorithms/test_grpo.py99 symbols
tests/unit/models/megatron/test_megatron_setup.py82 symbols
tests/unit/models/megatron/test_train.py76 symbols
nemo_rl/utils/logger.py73 symbols
tests/unit/models/automodel/test_automodel_data.py66 symbols
tests/unit/models/automodel/test_automodel_checkpoint.py60 symbols
tests/unit/test_check_metrics.py55 symbols
tests/unit/distributed/test_worker_groups.py52 symbols
tests/unit/utils/test_nsys.py49 symbols

Dependencies from manifests, versioned

colored2.2.3 · 1×
nemo-rl
ninja
pip
setuptools
torch2.10.0 · 1×

For agents

$ claude mcp add RL \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact