MCPcopy Index your code
hub / github.com/NVIDIA/gpu-operator

github.com/NVIDIA/gpu-operator @v26.3.3 sqlite

repository ↗ · DeepWiki ↗ · release v26.3.3 ↗
1,168 symbols 3,670 edges 107 files 653 documented · 56%
README

license pipeline status coverage report

NVIDIA GPU Operator

nvidia-gpu-operator

Kubernetes provides access to special hardware resources such as NVIDIA GPUs, NICs, Infiniband adapters and other devices through the device plugin framework. However, configuring and managing nodes with these hardware resources requires configuration of multiple software components such as drivers, container runtimes or other libraries which are difficult and prone to errors. The NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Runtime, automatic node labelling, DCGM based monitoring and others.

Audience and Use-Cases

The GPU Operator allows administrators of Kubernetes clusters to manage GPU nodes just like CPU nodes in the cluster. Instead of provisioning a special OS image for GPU nodes, administrators can rely on a standard OS image for both CPU and GPU nodes and then rely on the GPU Operator to provision the required software components for GPUs.

Note that the GPU Operator is specifically useful for scenarios where the Kubernetes cluster needs to scale quickly - for example provisioning additional GPU nodes on the cloud or on-prem and managing the lifecycle of the underlying software components. Since the GPU Operator runs everything as containers including NVIDIA drivers, the administrators can easily swap various components - simply by starting or stopping containers.

Quick Start

This section provides a quick guide for deploying the GPU Operator with the data center driver.

Make sure your Kubernetes cluster meets the prerequisites and is listed on the platform support page.

Step 1: Add the NVIDIA Helm repository

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
    && helm repo update

Step 2: Deploy GPU Operator

helm install --wait --generate-name \
    -n gpu-operator --create-namespace \
    nvidia/gpu-operator

After installation, the GPU Operator and its operands should be up and running.

Note: To deploy the GPU Operator on OpenShift, follow the instructions in the official documentation.

Product Documentation

For information on platform support and getting started, visit the official documentation repository.

Roadmap

  • Support the latest NVIDIA Data Center GPUs, systems, and drivers.
  • Support RHEL 10.
  • Support KubeVirt with Ubuntu 24.04.
  • Promote the NVIDIADriver CRD to General Availability (GA).
  • Integrate NVIDIA’s DRA Driver for GPUs as a managed component of the GPU Operator.

Webinar

How to easily use GPUs on Kubernetes

Contributions

Read the document on contributions. You can contribute by opening a pull request.

Support and Getting Help

Please open an issue on the GitHub project for any questions. Your feedback is appreciated.

Extension points exported contracts — how you extend this code

Updater (Interface)
Updater interface [3 implementers]
internal/conditions/conditions.go
Filter (Interface)
A Filter applies a filter on a list of Nodes [3 implementers]
internal/nodeinfo/filter.go
ConfigWithName (Interface)
+kubebuilder:object:generate=false [3 implementers]
api/nvidia/v1/clusterpolicy_types.go
Validator (Interface)
Validator provides interface to validate NVIDIADriver fields [2 implementers]
internal/validator/validator.go
NvidiaV1Interface (Interface)
(no doc) [4 implementers]
api/versioned/typed/nvidia/v1/nvidia_client.go
InfoCatalog (Interface)
InfoCatalog is an information catalog to be used to retrieve infoSources. used for State implementation that require add [1 …
internal/state/info_source.go
Renderer (Interface)
Renderer renders k8s objects from a manifest source dir and TemplatingData used by the templating engine [1 implementers]
internal/render/render.go
Interface (Interface)
Interface to the clusterinfo package
controllers/clusterinfo/clusterinfo.go

Core symbols most depended-on inside this repo

Error
called by 136
tests/e2e/framework/expect.go
setContainerEnv
called by 111
controllers/object_controls.go
Run
called by 94
cmd/nvidia-validator/metrics.go
String
called by 65
api/nvidia/v1/clusterpolicy_types.go
Get
called by 60
internal/state/info_source.go
IsEnabled
called by 48
api/nvidia/v1/clusterpolicy_types.go
List
called by 43
api/versioned/typed/nvidia/v1/clusterpolicy.go
Delete
called by 32
api/versioned/typed/nvidia/v1/clusterpolicy.go

Shape

Method 501
Function 445
Struct 179
Interface 21
TypeAlias 18
FuncType 4

Languages

Go100%

Modules by API surface

api/nvidia/v1/zz_generated.deepcopy.go122 symbols
controllers/object_controls.go116 symbols
api/nvidia/v1/clusterpolicy_types.go114 symbols
cmd/nvidia-validator/main.go79 symbols
controllers/transforms_test.go69 symbols
controllers/object_controls_test.go42 symbols
controllers/state_manager.go35 symbols
api/nvidia/v1alpha1/nvidiadriver_types.go35 symbols
api/nvidia/v1alpha1/zz_generated.deepcopy.go34 symbols
internal/state/driver.go30 symbols
internal/state/driver_test.go24 symbols
internal/nodeinfo/filter.go20 symbols

Dependencies from manifests, versioned

dario.cat/mergov1.0.1 · 1×
github.com/Azure/go-ansitermv0.0.0-2025010203350 · 1×
github.com/MakeNowJust/heredocv1.0.0 · 1×
github.com/Masterminds/goutilsv1.1.1 · 1×
github.com/Masterminds/semver/v3v3.4.0 · 1×
github.com/Mellanox/maintenance-operator/apiv0.3.0 · 1×
github.com/NVIDIA/go-nvlibv0.10.0 · 1×
github.com/NVIDIA/k8s-kata-managerv0.2.3 · 1×
github.com/NVIDIA/k8s-operator-libsv0.0.0-2026021518354 · 1×
github.com/beorn7/perksv1.0.1 · 1×

For agents

$ claude mcp add gpu-operator \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact