MCPcopy
hub / github.com/NVIDIA/Isaac-GR00T

github.com/NVIDIA/Isaac-GR00T @n1.7-release sqlite

repository ↗ · DeepWiki ↗ · release n1.7-release ↗
1,188 symbols 4,135 edges 125 files 621 documented · 52%
README

NVIDIA Isaac GR00T N1.7 Header

<a href="https://developer.nvidia.com/isaac/gr00t"><strong>Website</strong></a> |
<a href="https://huggingface.co/collections/nvidia/gr00t-n17"><strong>Model</strong></a> |
<a href="https://huggingface.co/collections/nvidia/physical-ai"><strong>Dataset</strong></a> |
<a href="https://arxiv.org/abs/2503.14734"><strong>Paper</strong></a> |
<a href="https://developer.nvidia.com/isaac"><strong>NVIDIA Isaac</strong></a> |
<a href="https://github.com/NVIDIA/Isaac-GR00T/raw/n1.7-release/FAQ.md"><strong>FAQ</strong></a>

Table of Contents


NVIDIA Isaac GR00T

We just released GR00T N1.7 Early Access, the latest version of GR00T N1 with a new VLM backbone (Cosmos-Reason2-2B / Qwen3-VL) and improved performance.

This is an Early Access (EA) release. You are welcome to download the model, explore the codebase, and begin building on the stack, with the understanding that support and stability guarantees are limited until the GA release.

What's available: - Pre-trained GR00T N1.7 model weights and reference code - Fine-tuning and inference with custom robot data or demonstrations - Experimentation, prototyping, and research use cases

Available at GA: - Production deployment with commercial support - Complete benchmarks and a fully validated, stable feature set - Pull request contributions

We welcome feedback - please feel free to raise issues in this repository.

To use older versions: N1.6 | N1.5

NVIDIA Isaac GR00T N1.7 is an open vision-language-action (VLA) model for generalized humanoid robot skills. This cross-embodiment model takes multimodal input, including language and images, to perform manipulation tasks in diverse environments.

GR00T N1.7 is trained on a diverse mixture of robot data including bimanual, semi-humanoid and an expansive humanoid dataset. It is adaptable through post-training for specific embodiments, tasks and environments.

GR00T N1.7 is fully commercially licensable under Apache 2.0. It delivers comparable performance to N1.6, with improved generalization and language-following capabilities driven by the inclusion of 20K hours of EgoScale human video data in pretraining.

The neural network architecture of GR00T N1.7 is a combination of vision-language foundation model and diffusion transformer head that denoises continuous actions. Here is a schematic diagram of the architecture:

model-architecture

Workflow Overview

  1. Prepare data — Collect robot demonstrations (video, state, action) and convert them to the GR00T LeRobot format. Demo datasets are included for quick testing.
  2. Run inference — Try zero-shot inference with the base model on pretrain embodiments, or use a finetuned checkpoint for benchmark tasks.
  3. Fine-tune — Adapt the model to your robot using launch_finetune.py with your own data and modality config.
  4. Evaluate — Validate with open-loop evaluation, then test in simulation benchmarks or on real hardware via the Policy API.
  5. Deploy — Connect Gr00tPolicy to your robot controller, optionally accelerated with TensorRT.

What's New in GR00T N1.7

GR00T N1.7 builds on N1.6 with a new VLM backbone and code-level improvements.

Key Changes from N1.6

  • New VLM backbone: Cosmos-Reason2-2B (Qwen3-VL architecture), replacing the Eagle backbone used in N1.6. Supports flexible resolution and encodes images in their native aspect ratio without padding.
  • Simplified data processing pipeline (processing_gr00t_n1d7.py).
  • Added full pipeline export to ONNX and TensorRT with improved frequency.

Installation

Hardware Requirements

Inference: 1 GPU with 16 GB+ VRAM (e.g., RTX 4090, L40, H100, Jetson AGX Thor/Orin, DGX Spark).

Fine-tuning: 1 or more GPUs with 40 GB+ VRAM recommended. We recommend H100 or L40 nodes for optimal performance. Other hardware (e.g., A6000) works but may require longer training time. See the Hardware Recommendation Guide for detailed specs.

CUDA / Python per platform: dGPU on CUDA 12.8 with Python 3.10; Jetson Orin on CUDA 12.6 with Python 3.10; Jetson Thor and DGX Spark on CUDA 13.0 with Python 3.12. The per-platform install scripts and Dockerfiles live under scripts/deployment/; see the Deployment & Inference Guide for the full matrix.

Clone the Repository

GR00T relies on submodules for certain dependencies. Include them when cloning:

Note: git-lfs is required to download parquet data files in /demo_data. Install it before cloning: sudo apt install git-lfs && git lfs install.

git clone --recurse-submodules https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00T

If you've already cloned without submodules, initialize them separately:

git submodule update --init --recursive

Set Up the Environment

GR00T uses uv for fast, reproducible dependency management. Install uv first:

curl -LsSf https://astral.sh/uv/install.sh | sh

dGPU (x86_64) — Default

Install FFmpeg (required by torchcodec, the default video backend):

sudo apt-get update && sudo apt-get install -y ffmpeg

Create the environment and install GR00T:

uv sync --python 3.10

GPU dependencies (flash-attn, TensorRT, etc.) are included in the default install.

Verify the installation:

uv run python -c "import gr00t; print('GR00T installed successfully')"

flash-attn message on every uv run: You may see Installing flash-attn... each time you run uv run. This is a known uv behavior with URL-pinned wheel sources — uv re-validates the cached wheel against the source URL on each invocation. It is not rebuilding from source; the wheel is already cached locally and the operation takes 2-3 seconds. This only affects x86_64 platforms. To suppress it, remove the flash-attn entries under [tool.uv.sources] in your local pyproject.toml after the initial install. But that will break uv lock and cause flash-attn to build from source on next lock regeneration.

Alternative: pip install (without uv)

If you prefer pip/conda over uv, create a Python 3.10 virtualenv and install:

python3.10 -m venv .venv && source .venv/bin/activate
pip install -e .

Note: GPU dependencies (flash-attn, TensorRT) may require manual installation with pip. The uv workflow handles these automatically.

If fine-tuning fails with CUDA_HOME is unset: Run bash scripts/deployment/dgpu/install_deps.sh once to configure CUDA paths, or manually export CUDA_HOME=/usr/local/cuda.

CUDA 13.x Users (Thor, Spark, and other CUDA 13+ platforms): PyTorch 2.7 pins Triton to 3.3.1, which does not recognize CUDA major version 13+. This causes a RuntimeError in Triton's ptx_get_version(). Run the patch script to fix: sh uv run bash scripts/patch_triton_cuda13.sh

GB300 (sm_103) Users: Triton 3.3.1 (pinned by PyTorch 2.7) does not support the GB300 GPU architecture (sm_103). torch.compile will fail on GB300. Use PyTorch eager mode or TensorRT inference instead. Triton 3.5.1+ adds sm_103 support but is not yet compatible with the pinned PyTorch version.

aarch64 Video Backend: On aarch64 platforms (Thor, Orin, Spark), torchcodec is the required video backend. install_deps.sh prefers the prebuilt aarch64 wheel under scripts/deployment/dgpu/wheels/ (shared by Thor/Spark against FFmpeg 6; Orin uses a matching build against FFmpeg 4) and falls back to a source build only if the wheel is missing. If you encounter NotImplementedError from the video backend, ensure torchcodec was installed successfully during setup. Other backends (decord, pyav) are not supported on aarch64.

DGX Spark (tested with DGX Spark GB10)

bash scripts/deployment/spark/install_deps.sh
source .venv/bin/activate
source scripts/activate_spark.sh

See the Spark setup guide for Docker and bare metal details.

Jetson AGX Thor (tested with JetPack 7.1)

flash-attn on older systems (e.g., Ubuntu 20.04 with glibc < 2.35): The pre-built flash-attn wheel may fail with ImportError: glibc_compat.so: cannot open shared object file. To fix this, build from source: sh uv pip install flash-attn==2.7.4.post1 --no-binary flash-attn --no-cache This compiles locally (~10-30 minutes) and avoids the glibc compatibility issue.

bash scripts/deployment/thor/install_deps.sh
source .venv/bin/activate
source scripts/activate_thor.sh

See the Thor setup guide for Docker and bare metal details.

Jetson Orin (tested with JetPack 6.2)

bash scripts/deployment/orin/install_deps.sh
source .venv/bin/activate
source scripts/activate_orin.sh

See the Orin setup guide for Docker and bare metal details.

For a containerized setup that avoids system-level dependency conflicts, see our Docker Setup Guide.


Model Checkpoints & Embodiment Tags

Checkpoints

Checkpoint Type Embodiment Tag Description
nvidia/GR00T-N1.7-3B Base See pretrain tags Base model (3B params) — zero-shot inference on pretrain embodiments, or finetune for new tasks
nvidia/GR00T-N1.7-LIBERO Finetuned LIBERO_PANDA Finetuned on LIBERO benchmark (Franka Panda)
nvidia/GR00T-N1.7-DROID Finetuned OXE_DROID_RELATIVE_EEF_RELATIVE_JOINT Finetuned on DROID dataset
nvidia/GR00T-N1.7-SimplerEnv-Bridge Finetuned SIMPLER_ENV_WIDOWX Finetuned on SimplerEnv Bridge (WidowX)
nvidia/GR00T-N1.7-SimplerEnv-Fractal Finetuned SIMPLER_ENV_GOOGLE Finetuned on SimplerEnv Fractal (Google Robot)

Older versions: N1.6 checkpoints | N1.5 checkpoints

Embodiment Tags

Every inference or finetuning command requires an --embodiment-tag. The tag determines which modality config (state/action keys, normalization) the model uses. Tags are case-insensitive.

For the full list of pretrain and posttrain tags, see the Policy API Guide — Embodiment Tags.


Data Format

GR00T uses a flavor of the LeRobot v2 dataset format with an additional meta/modality.json file that describes state/action/video structure. A dataset looks like:

my_dataset/
  meta/
    info.json            # dataset metadata
    episodes.jsonl       # episode index and lengths
    tasks.jsonl          # language task descriptions
    modality.json        # state/action/video key mapping (GR00T-specific)
  data/chunk-000/        # parquet files (state, action per timestep)
  videos/chunk-000/      # mp4 video files per episode

The modality.json maps how the concatenated state/action arrays split into named fields (e.g., x, y, z, gripper) and which video keys are available. This is what the embodiment tag uses to interpret the data.

Included demo datasets (ready to use, no download needed):

Dataset Robot Embodiment Tag Use Case
demo_data/droid_sample DROID (3 episodes) OXE_DROID_RELATIVE_EEF_RELATIVE_JOINT Zero-shot or finetuned inference (DROID)
demo_data/libero_demo LIBERO Panda (5 episodes) LIBERO_PANDA Inference with finetuned checkpoint
demo_data/simplerenv_bridge_sample WidowX (SimplerEnv Bridge) SIMPLER_ENV_WIDOWX Inference with finetuned SimplerEnv Bridge checkpoint
demo_data/simplerenv_fractal_sample Google Robot (SimplerEnv Fractal) SIMPLER_ENV_GOOGLE Inference with finetuned SimplerEnv Fractal checkpoint
`demo_data/cube_to

Core symbols most depended-on inside this repo

check
called by 78
scripts/validate_hf_config_alignment.py
to
called by 69
gr00t/data/state_action/action_chunking.py
resolve
called by 40
gr00t/data/embodiment_tags.py
load
called by 34
gr00t/configs/base_config.py
set_runtime_tensor_shape
called by 30
scripts/deployment/trt_torch.py
replace_once
called by 28
tests/test_support/readme.py
find_block
called by 27
tests/test_support/readme.py
close
called by 27
gr00t/eval/sim/LIBERO/libero_env.py

Shape

Method 616
Function 386
Class 185
Route 1

Languages

Python100%

Modules by API surface

gr00t/data/state_action/pose.py44 symbols
scripts/deployment/export_onnx_n1d7.py41 symbols
gr00t/model/gr00t_n1d7/image_augmentations.py34 symbols
examples/DROID/server_client.py32 symbols
tests/gr00t/data/state_action/test_state_action_processor.py30 symbols
gr00t/data/state_action/action_chunking.py28 symbols
gr00t/policy/server_client.py26 symbols
tests/gr00t/policy/test_policy_service.py23 symbols
tests/gr00t/data/test_embodiment_tags.py23 symbols
tests/getting_started/test_policy_md.py23 symbols
tests/test_support/runtime.py22 symbols
tests/gr00t/data/test_stats_pipeline.py22 symbols

Dependencies from manifests, versioned

albumentations1.4.18 · 1×
av16.1.0 · 1×
click8.1.8 · 1×
cryptography44.0.0 · 1×
datasets3.6.0 · 1×
diffusers0.36.0.dev0 · 1×
dm-tree0.1.8 · 1×
draccus
einops0.8.1 · 1×
flash-attn2.8.3 · 1×
gitpython3.1.46 · 1×
gymnasium1.2.2 · 1×

For agents

$ claude mcp add Isaac-GR00T \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact