Deploy Confidential Containers#

This page describes deploying Kata Containers and the NVIDIA GPU Operator. These are key pieces of the NVIDIA Confidential Containers Reference Architecture used to manage GPU resources on your cluster and deploy workloads into Confidential Containers.

Before you begin, refer to the Confidential Containers Reference Architecture for details on the reference architecture and the Supported Platforms page for the supported platforms.

This guide assumes you are familiar with the NVIDIA GPU Operator, Kata Containers, and Kubernetes cluster administration. Refer to the NVIDIA GPU Operator and Kata Containers documentation for more information on these software components. Refer to the Kubernetes documentation for more information on Kubernetes cluster administration.

Overview#

The high-level workflow for configuring Confidential Containers is as follows:

Configure the Prerequisites.
Label Nodes that you want to use with Confidential Containers.
Install the latest Kata Containers Helm chart. This installs the Kata Containers runtime binaries, UVM images and kernels, and TEE-specific shims (such as kata-qemu-nvidia-gpu-snp or kata-qemu-nvidia-gpu-tdx) onto the cluster’s worker nodes.
Install the NVIDIA GPU Operator configured for Confidential Containers. This installs the NVIDIA GPU Operator components that are required to deploy GPU passthrough workloads. The GPU Operator uses the node labels to determine what software components to deploy to a node.

After installation, you can run a sample GPU workload in a confidential container. You can also configure Attestation with the Trustee framework. The Trustee attestation service is typically deployed on a separate, trusted environment.

After configuration, you can schedule workloads that request GPU resources and use the kata-qemu-nvidia-gpu-tdx or kata-qemu-nvidia-gpu-snp runtime classes for secure deployment.

Label Nodes#

Get a list of the nodes in your cluster:

$ kubectl get nodes

Example Output:

NAME          STATUS   ROLES           AGE   VERSION
node-01       Ready    <none>          10d   v1.34.0
node-02       Ready    <none>          10d   v1.34.0

Set the NODE_NAME environment variable to the name of the node you want to configure:
```
$ export NODE_NAME="<node-name>"
```
Note

Commands in this guide use the $NODE_NAME environment variable to reference this node.
Label the node for Confidential Containers:
```
$ kubectl label node $NODE_NAME nvidia.com/gpu.workload.config=vm-passthrough
```
The GPU Operator uses this label to determine what software components to deploy to a node. The nvidia.com/gpu.workload.config=vm-passthrough label specifies that the node should receive the software components to run Confidential Containers.

A node can only run one container runtime at a time, so a labeled node runs only Confidential Container workloads and cannot run traditional GPU container workloads. The labeling approach is useful if you want to run Confidential Containers workloads on some nodes and traditional GPU container workloads on other nodes in your cluster. For more details on how the GPU Operator deploys components to your cluster, refer to the GPU Operator Cluster Topology Considerations section in the architecture overview.

Tip

Skip this section if you plan to use all nodes in your cluster to run Confidential Containers and instead set sandboxWorkloads.defaultWorkload=vm-passthrough when installing the GPU Operator.

Verify the node label was added:

$ kubectl describe node $NODE_NAME | grep nvidia.com/gpu.workload.config

Example Output:

nvidia.com/gpu.workload.config: vm-passthrough

After labeling the node, you can continue to the next steps to install Kata Containers and the NVIDIA GPU Operator.

Install the Kata Containers Helm Chart#

Install Kata Containers using the kata-deploy Helm chart. The kata-deploy chart installs all required components from the Kata Containers project including the Kata Containers runtime binary, runtime configuration, UVM kernel, and images that NVIDIA uses for Confidential Containers and native Kata containers.

The minimum required version is 3.29.0.

Set the chart version and registry path:

$ export VERSION="3.29.0"
$ export CHART="oci://ghcr.io/kata-containers/kata-deploy-charts/kata-deploy"

Install the kata-deploy Helm chart:
```
$ helm install kata-deploy "${CHART}" \
   --namespace kata-system --create-namespace \
   --set nfd.enabled=false \
   --wait --timeout 10m \
   --version "${VERSION}"
```
Example Output:
```
LAST DEPLOYED: Wed Apr  1 17:03:00 2026
NAMESPACE: kata-system
STATUS: deployed
REVISION: 1
DESCRIPTION: Install complete
TEST SUITE: None
```
Note

The --wait flag in the install command instructs Helm to wait until the release is deployed before returning. It can take a 2-3 minutes to return output.

There is a known Helm issue on single node clusters, that may result in the Helm command finishing before all deployed pods are finished initializing. If you are deploying to a single node cluster, you may need to wait for an additional few minutes after the Helm command completes for the kata-deploy pod to be in the Running state.

Note

Both kata-deploy and the GPU Operator deploy Node Feature Discovery (NFD) by default. The install command includes --set nfd.enabled=false to prevent kata-deploy from deploying NFD. The GPU Operator will deploy and manage NFD in the next step.

Optional: Verify that the kata-deploy pod is running:

$ kubectl get pods -n kata-system | grep kata-deploy

Example Output:

NAME                    READY   STATUS    RESTARTS      AGE
kata-deploy-b2lzs       1/1     Running   0             6m37s

Optional: Verify that the kata-qemu-nvidia-gpu, kata-qemu-nvidia-gpu-snp, and kata-qemu-nvidia-gpu-tdx runtime classes are available:
```
$ kubectl get runtimeclass | grep kata-qemu-nvidia-gpu
```
Example Output:
```
NAME                       HANDLER                    AGE
kata-qemu-nvidia-gpu       kata-qemu-nvidia-gpu       40s
kata-qemu-nvidia-gpu-snp   kata-qemu-nvidia-gpu-snp   40s
kata-qemu-nvidia-gpu-tdx   kata-qemu-nvidia-gpu-tdx   40s
```
Several runtimes are installed by the kata-deploy chart. The kata-qemu-nvidia-gpu runtime class is used with Kata Containers, in a non-Confidential Containers scenario. The kata-qemu-nvidia-gpu-snp and kata-qemu-nvidia-gpu-tdx runtime classes are used to deploy Confidential Containers workloads.
Optional: If you have an issue deploying the kata-deploy pod or are not seeing the expected runtime classes, get the pod name and view the logs:
```
$ kubectl get pods -n kata-system | grep kata-deploy
$ kubectl logs -n kata-system <pod-name>
```
Replace <pod-name> with the name of the kata-deploy pod from the first command’s output.

Install the NVIDIA GPU Operator#

Install the NVIDIA GPU Operator and configure it to deploy Confidential Container components.

Add and update the NVIDIA Helm repository:

$ helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
   && helm repo update

Example Output:

"nvidia" has been added to your repositories
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "nvidia" chart repository
Update Complete. ⎈Happy Helming!⎈

Install the GPU Operator with the following configuration:
```
$ helm install --generate-name \
   -n gpu-operator --create-namespace \
   nvidia/gpu-operator \
   --set sandboxWorkloads.enabled=true \
   --set sandboxWorkloads.mode=kata \
   --set nfd.enabled=true \
   --set nfd.nodefeaturerules=true \
   --version=v26.3.1
```
Example Output:
```
NAME: gpu-operator
LAST DEPLOYED: Tue Mar 10 17:58:12 2026
NAMESPACE: gpu-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
```
Tip

Add --set sandboxWorkloads.defaultWorkload=vm-passthrough if every worker node should deploy Confidential Containers by default.

Refer to the Common GPU Operator Configuration Settings section on this page for more details on the configuration options you can specify when installing the GPU Operator.

Refer to the Common chart customization options in Installing the NVIDIA GPU Operator for more details on the additional general configuration options you can specify when installing the GPU Operator.

Optional: Verify that all GPU Operator pods, especially the Confidential Computing Manager, Kata Device Plugin and VFIO Manager operands, are running:

$ kubectl get pods -n gpu-operator

Example Output:

NAME                                                              READY   STATUS    RESTARTS   AGE
gpu-operator-1766001809-node-feature-discovery-gc-75776475sxzkp   1/1     Running   0          86s
gpu-operator-1766001809-node-feature-discovery-master-6869lxq2g   1/1     Running   0          86s
gpu-operator-1766001809-node-feature-discovery-worker-mh4cv       1/1     Running   0          86s
gpu-operator-f48fd66b-vtfrl                                       1/1     Running   0          86s
nvidia-cc-manager-7z74t                                           1/1     Running   0          61s
nvidia-kata-sandbox-device-plugin-daemonset-d5rvg                 1/1     Running   0          30s
nvidia-sandbox-validator-6xnzc                                    1/1     Running   0          30s
nvidia-vfio-manager-h229x                                         1/1     Running   0          62s

For more details on each of the GPU Operator components, refer to the GPU Operator Cluster Topology Considerations section in the architecture overview.

Note

It can take several minutes for all GPU Operator pods to be in the Running state. If you are not seeing the expected output, you can view the logs for the GPU Operator pods:

$ kubectl logs -n gpu-operator <pod-name>

Replace <pod-name> with the name of the GPU Operator pod from kubectl get pods -n gpu-operator.

Optional: If you have host access to the worker node, you can perform the following validation step:
1. Confirm that the host uses the vfio-pci device driver for GPUs:
```
$ lspci -nnk -d 10de:
```
  Example Output:
```
65:00.0 3D controller [0302]: NVIDIA Corporation xxxxxxx [xxx] [10de:xxxx] (rev xx)
        Subsystem: NVIDIA Corporation xxxxxxx [xxx] [10de:xxxx]
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau
```
Tip

If you have an issue deploying the GPU Operator, refer to the NVIDIA GPU Operator troubleshooting guide for guidance on troubleshooting and resolving issues.

Common GPU Operator Configuration Settings#

The following are the available GPU Operator configuration settings to enable Confidential Containers:

Parameter	Description	Default
`sandboxWorkloads.enabled`	Enables sandbox workload management in the GPU Operator for virtual machine-style workloads and related operands.	`false`
`sandboxWorkloads.defaultWorkload`	Specifies the default type of workload for the cluster, one of `container`, `vm-passthrough`, or `vm-vgpu`. Setting `vm-passthrough` or `vm-vgpu` can be helpful if you plan to run all or mostly virtual machines in your cluster.	`container`
`sandboxWorkloads.mode`	Specifies the sandbox mode to use when deploying sandbox workloads. Accepted values are `kubevirt` (default) and `kata`.	`kubevirt`
`kataSandboxDevicePlugin.env`	Optional list of environment variables passed to the NVIDIA Kata Device Plugin pod. Each list item is an `EnvVar` object with required `name` and optional `value` fields. Use the setting to configure `P_GPU_ALIAS` or `NVSWITCH_ALIAS` for the Kata sandbox device plugin. Refer to the Configuring GPU or NVSwitch Resource Types Name section for more details.	`[]` (empty list)

Configuring GPU or NVSwitch Resource Types Name#

By default, the NVIDIA GPU Operator creates a resource type for GPUs and NVSwitches, nvidia.com/pgpu and nvidia.com/nvswitch. You can reference this name in your manifests to request GPU or NVSwitch resources for your workload. If you want to use a different name, you can set the P_GPU_ALIAS or NVSWITCH_ALIAS environment variables in the Kata device plugin to your preferred name. In clusters where all GPUs are the same model, a single resource type is typically sufficient.

In heterogeneous clusters, where you have different GPU types on your nodes, you might want to use specific GPU types for your workload. To do this, specify an empty P_GPU_ALIAS environment variable in the Kata sandbox device plugin by adding the following to your GPU Operator installation: --set kataSandboxDevicePlugin.env[0].name=P_GPU_ALIAS and --set kataSandboxDevicePlugin.env[0].value="".

When this variable is set to "", the Kata device plugin creates GPU model-specific resource types, for example nvidia.com/GH100_H200_141GB, instead of the default nvidia.com/pgpu type. Use the exposed device resource types in pod specs by specifying respective resource limits.

Similarly, you can set NVSWITCH_ALIAS to "" to advertise model-specific NVSwitch resource types.

The following example installs the GPU Operator with both P_GPU_ALIAS and NVSWITCH_ALIAS configured:

$ helm install --wait --timeout 10m --generate-name \
     -n gpu-operator --create-namespace \
     nvidia/gpu-operator \
     --set sandboxWorkloads.enabled=true \
     --set sandboxWorkloads.mode=kata \
     --set nfd.enabled=true \
     --set nfd.nodefeaturerules=true \
     --set kataSandboxDevicePlugin.env[0].name=P_GPU_ALIAS \
     --set kataSandboxDevicePlugin.env[0].value="" \
     --set kataSandboxDevicePlugin.env[1].name=NVSWITCH_ALIAS \
     --set kataSandboxDevicePlugin.env[1].value="" \
     --version=v26.3.1

After installing the GPU Operator, you can view the GPU or NVSwitch resource types available on a node by running the following command:

$ kubectl get node $NODE_NAME -o json | grep nvidia.com

Note

The NODE_NAME environment variable was set in the Label Nodes section. If you want to view the resource types for a different node, you can update the NODE_NAME environment variable and run the command again.

Example Output:

"nvidia.com/GH100_H200_141GB": "1"

Next Steps#

Run a Sample Workload to verify your deployment.
Configure additional options for your environment, including attestation, the confidential computing mode, and multi-GPU passthrough.
To help manage the lifecycle of Kata Containers, install the Kata Lifecycle Manager. This Argo Workflows-based tool manages Kata Containers upgrades and day-two operations.