Configuring Confidential Container Workloads#

A Confidential Container workload is a standard Kubernetes pod that runs inside a TEE-protected virtual machine and requests one or more GPUs through the NVIDIA Kata sandbox device plugin. Compared with a traditional GPU pod, a Confidential Container workload pod manifest differs in three ways:

It selects a TEE-aware Kata runtime class instead of the default runc-based runtime.
It requests GPU and NVSwitch resources using the resource types advertised by the NVIDIA Kata sandbox device plugin, which can be either default names or model-specific names.
For NVSwitch-based HGX systems, it requests every GPU and NVSwitch on the node together so that all devices reside inside the same Confidential Container virtual machine.

This page describes each of these decisions and provides single-GPU and multi-GPU passthrough manifest examples that you can copy and adapt to your environment.

Before beginning, you should configure your cluster to deploy Confidential Containers workloads using the Confidential Containers deployment steps.

Select a Container Runtime Class#

A Confidential Container workload must set spec.runtimeClassName to a TEE-aware Kata runtime that NVIDIA provides through the kata-deploy Helm chart. Select the runtime class based on the CPU TEE on the target worker node:

Node TEE	Runtime class	Typical CPU vendor
AMD SEV-SNP	`kata-qemu-nvidia-gpu-snp`	AMD EPYC (Genoa or newer)
Intel TDX	`kata-qemu-nvidia-gpu-tdx`	Intel Xeon (Sapphire Rapids or newer)

The kata-deploy chart also installs a kata-qemu-nvidia-gpu runtime class. That class is intended for non-confidential Kata workloads. You should not use it for Confidential Container workloads because it does not start the GPU in CC mode.

Reference GPU and NVSwitch Resource Types#

The NVIDIA Kata sandbox device plugin advertises GPUs and NVSwitches to Kubernetes as extended resources. Your pod manifest requests those resources under resources.limits. You can use either the default resource types or model-specific resource types.

By default, every passthrough GPU is advertised as nvidia.com/pgpu and every NVSwitch is advertised as nvidia.com/nvswitch. These names are stable across GPU models, which keeps manifests portable when every node in your cluster has the same GPU type.

A sample resource request using the default resource type is shown below:

resources:
  limits:
    nvidia.com/pgpu: "1"

In heterogeneous clusters, where worker nodes use different GPU models, you can configure the Kata sandbox device plugin to advertise resources under model-specific names by setting P_GPU_ALIAS="" (and optionally NVSWITCH_ALIAS="") on the plugin. With this configuration, GPUs are exposed as resources such as nvidia.com/GH100_H200_141GB, which lets a workload pin itself to a specific accelerator model.

Refer to Configuring GPU or NVSwitch Resource Types Name for the GPU Operator install flags that enable this behavior.

Use the model-specific resource name in workloads that must target a specific accelerator:

resources:
  limits:
    nvidia.com/GH100_H200_141GB: "1"

To list the GPU and NVSwitch resource types advertised on a node, run:

$ kubectl get node $NODE_NAME -o json | grep nvidia.com

Example Output:

"nvidia.com/GH100_H200_141GB": "1"

Single-GPU Passthrough#

A single-GPU workload requests one GPU and runs inside its own Confidential Container virtual machine. This pattern is the recommended starting point for verifying a deployment and for most independent workloads that do not require NVLink between GPUs.

Create a file, such as cuda-vectoradd-kata.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: cuda-vectoradd-kata
  namespace: default
spec:
  runtimeClassName: kata-qemu-nvidia-gpu-snp # or kata-qemu-nvidia-gpu-tdx
  restartPolicy: Never
  containers:
    - name: cuda-vectoradd
      image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04"
      resources:
        limits:
          nvidia.com/pgpu: "1"
          memory: 16Gi

Note

If you configured the Kata sandbox device plugin to use model-specific resource types, replace nvidia.com/pgpu with the appropriate model-specific name, for example nvidia.com/GH100_H200_141GB.

Create the pod:

$ kubectl apply -f cuda-vectoradd-kata.yaml

Verify the workload completes successfully:

$ kubectl logs cuda-vectoradd-kata

Example Output:

[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

Refer to Run a Sample Workload for the end-to-end verification flow including deletion and troubleshooting tips.

Multi-GPU Passthrough#

Multi-GPU passthrough assigns every GPU and NVSwitch on a node to a single Confidential Container virtual machine. This configuration is required for NVSwitch (NVLink) based HGX systems running confidential workloads.

Important

You must assign all the GPUs and NVSwitches on the node to the same Confidential Container virtual machine. Configuring only a subset of GPUs for Confidential Computing on a single node is not supported.

NVIDIA Hopper PPCIE Mode#

For NVIDIA Hopper GPUs, multi-GPU passthrough requires protected PCIe (PPCIE) mode, which claims exclusive use of the NVSwitches for a single Confidential Container. The NVIDIA Confidential Computing Manager for Kubernetes transitions GPUs into the correct mode based on the cc.mode label that you set.

Set the NODE_NAME environment variable to the node you want to configure:
```
$ export NODE_NAME="<node-name>"
```

Apply the ppcie CC mode label to the node:

$ kubectl label node $NODE_NAME nvidia.com/cc.mode=ppcie --overwrite

Refer to Managing the Confidential Computing Mode for full details on setting the CC mode and verifying the change.

NVIDIA Blackwell GPUs use NVLink encryption, which places the switches outside of the Trusted Computing Base (TCB), so the default CC mode of on is sufficient and no additional configuration is required.

Run a Multi-GPU Workload#

Create a file, such as multi-gpu-kata.yaml, with a pod manifest that requests every GPU and NVSwitch on the node:

apiVersion: v1
kind: Pod
metadata:
  name: multi-gpu-kata
  namespace: default
spec:
  runtimeClassName: kata-qemu-nvidia-gpu-snp # or kata-qemu-nvidia-gpu-tdx
  restartPolicy: Never
  containers:
    - name: cuda-sample
      image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04"
      resources:
        limits:
          nvidia.com/pgpu: "8"
          nvidia.com/nvswitch: "4" # Only for NVIDIA Hopper GPUs with PPCIE mode
          memory: 128Gi

Note

If you configured P_GPU_ALIAS or NVSWITCH_ALIAS for heterogeneous clusters, replace nvidia.com/pgpu and nvidia.com/nvswitch with the corresponding model-specific resource types. Refer to Reference GPU and NVSwitch Resource Types for details.

Create the pod:

$ kubectl apply -f multi-gpu-kata.yaml

Example Output:

pod/multi-gpu-kata created

Verify the pod is running:

$ kubectl get pod multi-gpu-kata

Example Output:

NAME             READY   STATUS    RESTARTS   AGE
multi-gpu-kata   1/1     Running   0          30s

Verify that all GPUs are visible inside the container:

$ kubectl exec multi-gpu-kata -- nvidia-smi -L

Example Output:

GPU 0: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
GPU 1: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
GPU 2: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
GPU 3: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
GPU 4: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
GPU 5: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
GPU 6: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
GPU 7: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)

Delete the pod:

$ kubectl delete -f multi-gpu-kata.yaml