Container Device Interface (CDI) and Node Resource Interface (NRI) Plugin Support#
This page gives an overview of CDI and NRI Plugin support in the GPU Operator.
About Container Device Interface (CDI)#
The Container Device Interface (CDI) is an open specification for container runtimes that abstracts what access to a device, such as an NVIDIA GPU, means, and standardizes access across container runtimes. Popular container runtimes can read and process the specification to ensure that a device is available in a container. CDI simplifies adding support for devices such as NVIDIA GPUs because the specification is applicable to all container runtimes that support CDI.
Starting with GPU Operator v25.10.0, CDI is used by default for enabling GPU support in containers running on Kubernetes.
Specifically, CDI support in container runtimes, like containerd and cri-o, is used to inject GPU(s) into workload
containers. This differs from prior GPU Operator releases where CDI was used via a CDI-enabled nvidia runtime class.
Use of CDI is transparent to cluster administrators and application developers. The benefits of CDI are largely to reduce development and support for runtime-specific plugins.
Enabling CDI#
CDI is enabled by default during installation in GPU Operator v25.10.0 and later. Follow the instructions for installing the Operator with Helm on the Installing the NVIDIA GPU Operator page.
CDI is also enabled by default during a Helm upgrade to GPU Operator v25.10.0 and later.
Enabling CDI After Installation#
CDI is enabled by default in GPU Operator v25.10.0 and later. Use the following procedure to enable CDI if you disabled CDI during installation.
Procedure
Enable CDI by modifying the cluster policy:
$ kubectl patch clusterpolicies.nvidia.com/cluster-policy --type='json' \ -p='[{"op": "replace", "path": "/spec/cdi/enabled", "value":true}]'
Example Output
clusterpolicy.nvidia.com/cluster-policy patched(Optional) Confirm that the container toolkit and device plugin pods restart:
$ kubectl get pods -n gpu-operatorExample Output
NAME READY STATUS RESTARTS AGE gpu-feature-discovery-qnw2q 1/1 Running 0 47h gpu-operator-6d59774ff-hznmr 1/1 Running 0 2d gpu-operator-node-feature-discovery-master-6d6649d597-7l8bj 1/1 Running 0 2d gpu-operator-node-feature-discovery-worker-v86vj 1/1 Running 0 2d nvidia-container-toolkit-daemonset-2768s 1/1 Running 0 2m11s nvidia-cuda-validator-ls4vc 0/1 Completed 0 47h nvidia-dcgm-exporter-fxp9h 1/1 Running 0 47h nvidia-device-plugin-daemonset-dvp4v 1/1 Running 0 2m26s nvidia-device-plugin-validator-kvxbs 0/1 Completed 0 47h nvidia-driver-daemonset-m86r7 1/1 Running 0 2d nvidia-operator-validator-xg98r 1/1 Running 0 47h
Disabling CDI#
While CDI is the default and recommended mechanism for injecting GPU support into containers, you can disable CDI and use the legacy NVIDIA Container Toolkit stack instead with the following procedure:
If your nodes use the CRI-O container runtime, then temporarily disable the GPU Operator validator:
$ kubectl label nodes \ nvidia.com/gpu.deploy.operator-validator=false \ -l nvidia.com/gpu.present=true \ --overwrite
Tip
You can run
kubectl get nodes -o wideand view theCONTAINER-RUNTIMEcolumn to determine if your nodes use CRI-O.Disable CDI by modifying the cluster policy:
$ kubectl patch clusterpolicies.nvidia.com/cluster-policy --type='json' \ -p='[{"op": "replace", "path": "/spec/cdi/enabled", "value":false}]'
Example Output
clusterpolicy.nvidia.com/cluster-policy patchedIf you temporarily disabled the GPU Operator validator, re-enable the validator:
$ kubectl label nodes \ nvidia.com/gpu.deploy.operator-validator=true \ nvidia.com/gpu.present=true \ --overwrite
About the Node Resource Interface (NRI) Plugin#
Node Resource Interface (NRI) is a standardized interface for plugging in extensions, called NRI Plugins, to OCI-compatible container runtimes like CRI-O and containerd. NRI Plugins serve as hooks which intercept pod and container lifecycle events and perform functions including inject devices (CDI devices, Linux device nodes, device mounts) to a container, topology aware placement strategies, and more. For more details on NRI, refer to the NRI overview in the containerd repository.
When enabled in the GPU Operator, the NRI Plugin, managed by the NVIDIA Container Toolkit, provides an alternative to the nvidia runtime class to provision GPU workload pods.
It allows the GPU Operator to extend the container runtime behaviour without modifying the container runtime.
This feature also simplifies deployments on platforms like k3s, k0s, or RKE, because the GPU Operator no longer needs setting of values like CONTAINERD_CONFIG, CONTAINERD_SOCKET, or RUNTIME_CONFIG_SOURCE.
Enabling the NRI Plugin#
The NRI Plugin requires the following:
CDI to be enabled in the GPU Operator.
CRI-O v1.34.0 or later or containerd v1.7.30, v2.1.x, or v2.2.x. If you are not using the latest containerd version, check that both CDI and NRI are enabled in the containerd configuration file before deploying GPU Operator.
To enable the NRI Plugin during installation, follow the instructions for installing the Operator with Helm on the Installing the NVIDIA GPU Operator page and include the --set cdi.nriPluginEnabled=true argument in you Helm command.
Enabling the NRI Plugin After Installation#
Enable NRI Plugin by modifying the cluster policy:
$ kubectl patch clusterpolicies.nvidia.com/cluster-policy --type='json' \ -p='[{"op": "replace", "path": "/spec/cdi/nriPluginEnabled", "value":true}]'
Example Output
clusterpolicy.nvidia.com/cluster-policy patched(Optional) Confirm that the container toolkit and device plugin pods restart:
$ kubectl get pods -n gpu-operatorExample Output
NAME READY STATUS RESTARTS AGE gpu-feature-discovery-qnw2q 1/1 Running 0 47h gpu-operator-6d59774ff-hznmr 1/1 Running 0 2d gpu-operator-node-feature-discovery-master-6d6649d597-7l8bj 1/1 Running 0 2d gpu-operator-node-feature-discovery-worker-v86vj 1/1 Running 0 2d nvidia-container-toolkit-daemonset-2768s 1/1 Running 0 2m11s nvidia-cuda-validator-ls4vc 0/1 Completed 0 47h nvidia-dcgm-exporter-fxp9h 1/1 Running 0 47h nvidia-device-plugin-daemonset-dvp4v 1/1 Running 0 2m26s nvidia-device-plugin-validator-kvxbs 0/1 Completed 0 47h nvidia-driver-daemonset-m86r7 1/1 Running 0 2d nvidia-operator-validator-xg98r 1/1 Running 0 47h
Disabling the NRI Plugin#
Disable the NRI Plugin and use the nvidia runtime class instead with the following procedure:
Disable the NRI Plugin by modifying the cluster policy:
$ kubectl patch clusterpolicies.nvidia.com/cluster-policy --type='json' \
-p='[{"op": "replace", "path": "/spec/cdi/nriPluginEnabled", "value":false}]'
Example Output
clusterpolicy.nvidia.com/cluster-policy patched