Configuring Confidential Container Workloads#
A Confidential Container workload is a standard Kubernetes pod that runs inside a TEE-protected virtual machine and requests one or more GPUs through the NVIDIA Kata sandbox device plugin. Compared with a traditional GPU pod, a Confidential Container workload pod manifest differs in three ways:
It selects a TEE-aware Kata runtime class instead of the default
runc-based runtime.It requests GPU and NVSwitch resources using the resource types advertised by the NVIDIA Kata sandbox device plugin, which can be either default names or model-specific names.
For NVSwitch-based HGX systems, it requests every GPU and NVSwitch on the node together so that all devices reside inside the same Confidential Container virtual machine.
This page describes each of these decisions and provides single-GPU and multi-GPU passthrough manifest examples that you can copy and adapt to your environment.
Before beginning, you should configure your cluster to deploy Confidential Containers workloads using the Confidential Containers deployment steps.
Select a Container Runtime Class#
A Confidential Container workload must set spec.runtimeClassName to a TEE-aware Kata
runtime that NVIDIA provides through the kata-deploy Helm chart.
Select the runtime class based on the CPU TEE on the target worker node:
Node TEE |
Runtime class |
Typical CPU vendor |
|---|---|---|
AMD SEV-SNP |
|
AMD EPYC (Genoa or newer) |
Intel TDX |
|
Intel Xeon (Sapphire Rapids or newer) |
The kata-deploy chart also installs a kata-qemu-nvidia-gpu runtime class.
That class is intended for non-confidential Kata workloads. You should not use it for Confidential
Container workloads because it does not start the GPU in CC mode.
Reference GPU and NVSwitch Resource Types#
The NVIDIA Kata sandbox device plugin advertises GPUs and NVSwitches to Kubernetes as extended resources.
Your pod manifest requests those resources under resources.limits.
You can use either the default resource types or model-specific resource types.
By default, every passthrough GPU is advertised as nvidia.com/pgpu and every NVSwitch is advertised as nvidia.com/nvswitch.
These names are stable across GPU models, which keeps manifests portable when every node in your cluster has the same GPU type.
A sample resource request using the default resource type is shown below:
resources:
limits:
nvidia.com/pgpu: "1"
In heterogeneous clusters, where worker nodes use different GPU models, you can configure the Kata sandbox device plugin to advertise resources under model-specific names by setting
P_GPU_ALIAS="" (and optionally NVSWITCH_ALIAS="") on the plugin.
With this configuration, GPUs are exposed as resources such as nvidia.com/GH100_H200_141GB,
which lets a workload pin itself to a specific accelerator model.
Refer to Configuring GPU or NVSwitch Resource Types Name for the GPU Operator install flags that enable this behavior.
Use the model-specific resource name in workloads that must target a specific accelerator:
resources:
limits:
nvidia.com/GH100_H200_141GB: "1"
To list the GPU and NVSwitch resource types advertised on a node, run:
$ kubectl get node $NODE_NAME -o json | grep nvidia.com
Example Output:
"nvidia.com/GH100_H200_141GB": "1"
Single-GPU Passthrough#
A single-GPU workload requests one GPU and runs inside its own Confidential Container virtual machine. This pattern is the recommended starting point for verifying a deployment and for most independent workloads that do not require NVLink between GPUs.
Create a file, such as
cuda-vectoradd-kata.yaml:apiVersion: v1 kind: Pod metadata: name: cuda-vectoradd-kata namespace: default spec: runtimeClassName: kata-qemu-nvidia-gpu-snp # or kata-qemu-nvidia-gpu-tdx restartPolicy: Never containers: - name: cuda-vectoradd image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04" resources: limits: nvidia.com/pgpu: "1" memory: 16Gi
Note
If you configured the Kata sandbox device plugin to use model-specific resource types, replace
nvidia.com/pgpuwith the appropriate model-specific name, for examplenvidia.com/GH100_H200_141GB.Create the pod:
$ kubectl apply -f cuda-vectoradd-kata.yamlVerify the workload completes successfully:
$ kubectl logs cuda-vectoradd-kataExample Output:
[Vector addition of 50000 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED Done
Refer to Run a Sample Workload for the end-to-end verification flow including deletion and troubleshooting tips.
Multi-GPU Passthrough#
Multi-GPU passthrough assigns every GPU and NVSwitch on a node to a single Confidential Container virtual machine. This configuration is required for NVSwitch (NVLink) based HGX systems running confidential workloads.
Important
You must assign all the GPUs and NVSwitches on the node to the same Confidential Container virtual machine. Configuring only a subset of GPUs for Confidential Computing on a single node is not supported.
NVIDIA Hopper PPCIE Mode#
For NVIDIA Hopper GPUs, multi-GPU passthrough requires protected PCIe (PPCIE) mode, which
claims exclusive use of the NVSwitches for a single Confidential Container.
The NVIDIA Confidential Computing Manager for Kubernetes transitions GPUs into the correct
mode based on the cc.mode label that you set.
Set the
NODE_NAMEenvironment variable to the node you want to configure:$ export NODE_NAME="<node-name>"
Apply the
ppcieCC mode label to the node:$ kubectl label node $NODE_NAME nvidia.com/cc.mode=ppcie --overwrite
Refer to Managing the Confidential Computing Mode for full details on setting the CC mode and verifying the change.
NVIDIA Blackwell GPUs use NVLink encryption, which places the switches outside of the
Trusted Computing Base (TCB), so the default CC mode of on is sufficient and no additional
configuration is required.
Run a Multi-GPU Workload#
Create a file, such as
multi-gpu-kata.yaml, with a pod manifest that requests every GPU and NVSwitch on the node:apiVersion: v1 kind: Pod metadata: name: multi-gpu-kata namespace: default spec: runtimeClassName: kata-qemu-nvidia-gpu-snp # or kata-qemu-nvidia-gpu-tdx restartPolicy: Never containers: - name: cuda-sample image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04" resources: limits: nvidia.com/pgpu: "8" nvidia.com/nvswitch: "4" # Only for NVIDIA Hopper GPUs with PPCIE mode memory: 128Gi
Note
If you configured
P_GPU_ALIASorNVSWITCH_ALIASfor heterogeneous clusters, replacenvidia.com/pgpuandnvidia.com/nvswitchwith the corresponding model-specific resource types. Refer to Reference GPU and NVSwitch Resource Types for details.Create the pod:
$ kubectl apply -f multi-gpu-kata.yamlExample Output:
pod/multi-gpu-kata createdVerify the pod is running:
$ kubectl get pod multi-gpu-kataExample Output:
NAME READY STATUS RESTARTS AGE multi-gpu-kata 1/1 Running 0 30s
Verify that all GPUs are visible inside the container:
$ kubectl exec multi-gpu-kata -- nvidia-smi -L
Example Output:
GPU 0: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) GPU 1: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) GPU 2: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) GPU 3: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) GPU 4: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) GPU 5: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) GPU 6: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) GPU 7: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
Delete the pod:
$ kubectl delete -f multi-gpu-kata.yaml