
<a href="https://developer.nvidia.com/isaac/gr00t"><strong>Website</strong></a> |
<a href="https://huggingface.co/collections/nvidia/gr00t-n17"><strong>Model</strong></a> |
<a href="https://huggingface.co/collections/nvidia/physical-ai"><strong>Dataset</strong></a> |
<a href="https://arxiv.org/abs/2503.14734"><strong>Paper</strong></a> |
<a href="https://developer.nvidia.com/isaac"><strong>NVIDIA Isaac</strong></a> |
<a href="https://github.com/NVIDIA/Isaac-GR00T/raw/n1.7-release/FAQ.md"><strong>FAQ</strong></a>
|
|
|
We just released GR00T N1.7 Early Access, the latest version of GR00T N1 with a new VLM backbone (Cosmos-Reason2-2B / Qwen3-VL) and improved performance.
This is an Early Access (EA) release. You are welcome to download the model, explore the codebase, and begin building on the stack, with the understanding that support and stability guarantees are limited until the GA release.
What's available: - Pre-trained GR00T N1.7 model weights and reference code - Fine-tuning and inference with custom robot data or demonstrations - Experimentation, prototyping, and research use cases
Available at GA: - Production deployment with commercial support - Complete benchmarks and a fully validated, stable feature set - Pull request contributions
We welcome feedback - please feel free to raise issues in this repository.
NVIDIA Isaac GR00T N1.7 is an open vision-language-action (VLA) model for generalized humanoid robot skills. This cross-embodiment model takes multimodal input, including language and images, to perform manipulation tasks in diverse environments.
GR00T N1.7 is trained on a diverse mixture of robot data including bimanual, semi-humanoid and an expansive humanoid dataset. It is adaptable through post-training for specific embodiments, tasks and environments.
GR00T N1.7 is fully commercially licensable under Apache 2.0. It delivers comparable performance to N1.6, with improved generalization and language-following capabilities driven by the inclusion of 20K hours of EgoScale human video data in pretraining.
The neural network architecture of GR00T N1.7 is a combination of vision-language foundation model and diffusion transformer head that denoises continuous actions. Here is a schematic diagram of the architecture:

launch_finetune.py with your own data and modality config.Gr00tPolicy to your robot controller, optionally accelerated with TensorRT.GR00T N1.7 builds on N1.6 with a new VLM backbone and code-level improvements.
processing_gr00t_n1d7.py).Inference: 1 GPU with 16 GB+ VRAM (e.g., RTX 4090, L40, H100, Jetson AGX Thor/Orin, DGX Spark).
Fine-tuning: 1 or more GPUs with 40 GB+ VRAM recommended. We recommend H100 or L40 nodes for optimal performance. Other hardware (e.g., A6000) works but may require longer training time. See the Hardware Recommendation Guide for detailed specs.
CUDA / Python per platform: dGPU on CUDA 12.8 with Python 3.10; Jetson Orin on CUDA 12.6 with Python 3.10; Jetson Thor and DGX Spark on CUDA 13.0 with Python 3.12. The per-platform install scripts and Dockerfiles live under scripts/deployment/; see the Deployment & Inference Guide for the full matrix.
GR00T relies on submodules for certain dependencies. Include them when cloning:
Note: git-lfs is required to download parquet data files in /demo_data. Install it before cloning: sudo apt install git-lfs && git lfs install.
git clone --recurse-submodules https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00T
If you've already cloned without submodules, initialize them separately:
git submodule update --init --recursive
GR00T uses uv for fast, reproducible dependency management. Install uv first:
curl -LsSf https://astral.sh/uv/install.sh | sh
Install FFmpeg (required by torchcodec, the default video backend):
sudo apt-get update && sudo apt-get install -y ffmpeg
Create the environment and install GR00T:
uv sync --python 3.10
GPU dependencies (flash-attn, TensorRT, etc.) are included in the default install.
Verify the installation:
uv run python -c "import gr00t; print('GR00T installed successfully')"
flash-attnmessage on everyuv run: You may seeInstalling flash-attn...each time you runuv run. This is a knownuvbehavior with URL-pinned wheel sources —uvre-validates the cached wheel against the source URL on each invocation. It is not rebuilding from source; the wheel is already cached locally and the operation takes 2-3 seconds. This only affects x86_64 platforms. To suppress it, remove theflash-attnentries under[tool.uv.sources]in your localpyproject.tomlafter the initial install. But that will breakuv lockand cause flash-attn to build from source on next lock regeneration.
Alternative: pip install (without uv)
If you prefer pip/conda over uv, create a Python 3.10 virtualenv and install:
python3.10 -m venv .venv && source .venv/bin/activate
pip install -e .
Note: GPU dependencies (flash-attn, TensorRT) may require manual installation with pip. The uv workflow handles these automatically.
If fine-tuning fails with
CUDA_HOME is unset: Runbash scripts/deployment/dgpu/install_deps.shonce to configure CUDA paths, or manuallyexport CUDA_HOME=/usr/local/cuda.CUDA 13.x Users (Thor, Spark, and other CUDA 13+ platforms): PyTorch 2.7 pins Triton to 3.3.1, which does not recognize CUDA major version 13+. This causes a
RuntimeErrorin Triton'sptx_get_version(). Run the patch script to fix:sh uv run bash scripts/patch_triton_cuda13.shGB300 (sm_103) Users: Triton 3.3.1 (pinned by PyTorch 2.7) does not support the GB300 GPU architecture (sm_103).
torch.compilewill fail on GB300. Use PyTorch eager mode or TensorRT inference instead. Triton 3.5.1+ adds sm_103 support but is not yet compatible with the pinned PyTorch version.aarch64 Video Backend: On aarch64 platforms (Thor, Orin, Spark),
torchcodecis the required video backend.install_deps.shprefers the prebuilt aarch64 wheel underscripts/deployment/dgpu/wheels/(shared by Thor/Spark against FFmpeg 6; Orin uses a matching build against FFmpeg 4) and falls back to a source build only if the wheel is missing. If you encounterNotImplementedErrorfrom the video backend, ensuretorchcodecwas installed successfully during setup. Other backends (decord, pyav) are not supported on aarch64.
DGX Spark (tested with DGX Spark GB10)
bash scripts/deployment/spark/install_deps.sh
source .venv/bin/activate
source scripts/activate_spark.sh
See the Spark setup guide for Docker and bare metal details.
Jetson AGX Thor (tested with JetPack 7.1)
flash-attn on older systems (e.g., Ubuntu 20.04 with glibc < 2.35): The pre-built
flash-attnwheel may fail withImportError: glibc_compat.so: cannot open shared object file. To fix this, build from source:sh uv pip install flash-attn==2.7.4.post1 --no-binary flash-attn --no-cacheThis compiles locally (~10-30 minutes) and avoids the glibc compatibility issue.
bash scripts/deployment/thor/install_deps.sh
source .venv/bin/activate
source scripts/activate_thor.sh
See the Thor setup guide for Docker and bare metal details.
Jetson Orin (tested with JetPack 6.2)
bash scripts/deployment/orin/install_deps.sh
source .venv/bin/activate
source scripts/activate_orin.sh
See the Orin setup guide for Docker and bare metal details.
For a containerized setup that avoids system-level dependency conflicts, see our Docker Setup Guide.
| Checkpoint | Type | Embodiment Tag | Description |
|---|---|---|---|
nvidia/GR00T-N1.7-3B |
Base | See pretrain tags | Base model (3B params) — zero-shot inference on pretrain embodiments, or finetune for new tasks |
nvidia/GR00T-N1.7-LIBERO |
Finetuned | LIBERO_PANDA |
Finetuned on LIBERO benchmark (Franka Panda) |
nvidia/GR00T-N1.7-DROID |
Finetuned | OXE_DROID_RELATIVE_EEF_RELATIVE_JOINT |
Finetuned on DROID dataset |
nvidia/GR00T-N1.7-SimplerEnv-Bridge |
Finetuned | SIMPLER_ENV_WIDOWX |
Finetuned on SimplerEnv Bridge (WidowX) |
nvidia/GR00T-N1.7-SimplerEnv-Fractal |
Finetuned | SIMPLER_ENV_GOOGLE |
Finetuned on SimplerEnv Fractal (Google Robot) |
Older versions: N1.6 checkpoints | N1.5 checkpoints
Every inference or finetuning command requires an --embodiment-tag. The tag determines which modality config (state/action keys, normalization) the model uses. Tags are case-insensitive.
For the full list of pretrain and posttrain tags, see the Policy API Guide — Embodiment Tags.
GR00T uses a flavor of the LeRobot v2 dataset format with an additional meta/modality.json file that describes state/action/video structure. A dataset looks like:
my_dataset/
meta/
info.json # dataset metadata
episodes.jsonl # episode index and lengths
tasks.jsonl # language task descriptions
modality.json # state/action/video key mapping (GR00T-specific)
data/chunk-000/ # parquet files (state, action per timestep)
videos/chunk-000/ # mp4 video files per episode
The modality.json maps how the concatenated state/action arrays split into named fields (e.g., x, y, z, gripper) and which video keys are available. This is what the embodiment tag uses to interpret the data.
Included demo datasets (ready to use, no download needed):
| Dataset | Robot | Embodiment Tag | Use Case |
|---|---|---|---|
demo_data/droid_sample |
DROID (3 episodes) | OXE_DROID_RELATIVE_EEF_RELATIVE_JOINT |
Zero-shot or finetuned inference (DROID) |
demo_data/libero_demo |
LIBERO Panda (5 episodes) | LIBERO_PANDA |
Inference with finetuned checkpoint |
demo_data/simplerenv_bridge_sample |
WidowX (SimplerEnv Bridge) | SIMPLER_ENV_WIDOWX |
Inference with finetuned SimplerEnv Bridge checkpoint |
demo_data/simplerenv_fractal_sample |
Google Robot (SimplerEnv Fractal) | SIMPLER_ENV_GOOGLE |
Inference with finetuned SimplerEnv Fractal checkpoint |
| `demo_data/cube_to |
$ claude mcp add Isaac-GR00T \
-- python -m otcore.mcp_server <graph>