Prerequisites#
As a Kubernetes Cluster Administrator, prepare hosts and the Kubernetes cluster before you install Kata Containers and the NVIDIA GPU Operator. You perform most steps in this section. If you do not have access to host firmware, coordinate with your Hardware IT Administrator or Host OS Administrator to confirm or implement hardware prerequisites.
For validated hardware and software versions, refer to Supported Platforms. Use the checklists below for an at-a-glance summary, then follow each linked section for verification steps.
Hardware prerequisites
Prerequisite |
Details |
|---|---|
CPU, GPU, and host OS match Supported Platforms |
|
Hardware virtualization and ACS enabled in host BIOS |
|
IOMMU enabled on each host through the kernel command line ( |
|
No NVIDIA GPU drivers installed or loaded on worker hosts. |
Cluster prerequisites
Prerequisite |
Details |
|---|---|
Cluster administrator access to a Kubernetes cluster running a supported version (refer to Supported Software Components) |
|
containerd 2.2.2 installed on each GPU worker node |
|
Helm installed on your cluster administration system |
|
Enable |
Hardware and BIOS#
Supported Platform#
Your hosts must use a platform validated for Confidential Computing in Supported Platforms. Confirm with your Hardware IT Administrator and Host OS Administrator that any platform-specific BIOS, firmware, or OS steps are in place before continuing.
Hardware Virtualization and ACS Enabled#
Confirm with your Hardware IT Administrator that your hosts are configured to enable hardware virtualization and Access Control Services (ACS). With some AMD CPUs and BIOSes, ACS might be grouped under Advanced Error Reporting (AER). Enable these features in the host BIOS if they are not already enabled.
IOMMU Enabled#
IOMMU must be enabled on all hosts that will run Confidential Containers workloads.
Check whether IOMMU is already enabled:
$ ls /sys/kernel/iommu_groupsIf the output lists numbered groups (
0,1, and so on), IOMMU is enabled.If the output is empty or the directory is missing, IOMMU is not enabled.
If IOMMU is not enabled, add the appropriate kernel command-line argument to
/etc/default/grub:amd_iommu=onfor AMD CPUsintel_iommu=onfor Intel CPUs
... GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on modprobe.blacklist=nouveau" ...
... GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on modprobe.blacklist=nouveau" ...
Update the bootloader configuration:
$ sudo update-grubExample Output:
Sourcing file `/etc/default/grub' Generating grub configuration file ... Found linux image: /boot/vmlinuz-5.15.0-generic Found initrd image: /boot/initrd.img-5.15.0-generic done
Reboot the host.
Note
After configuring IOMMU, you might see QEMU warnings about PCI P2P DMA when running GPU workloads. These are expected and can be safely ignored. Refer to Limitations and Restrictions for details.
Ensure No Host NVIDIA GPU Drivers Are Present#
Confidential Containers pass GPUs to the confidential virtual machine through VFIO. Host-installed NVIDIA drivers prevent VFIO from binding the devices and must not be present on those hosts. In this architecture, the NVIDIA GPU Operator handles GPU driver installation and lifecycle management when you follow the Detailed Install Guide.
On each host, check whether NVIDIA GPU drivers are loaded:
$ lsmod | grep nvidia
If the command produces no output, no NVIDIA GPU drivers are loaded.
If drivers are installed or loaded on any host, remove them.
Refer to Removing the Driver in the NVIDIA Driver Installation Guide.
Kubernetes Cluster#
The following sections describe requirements for worker nodes and for the system you use for cluster administration.
Kubernetes Cluster and Cluster Administrator Access#
You must have cluster administrator access to a Kubernetes cluster running a supported Kubernetes version. Refer to the Supported Software Components section in Supported Platforms for supported Kubernetes and component versions.
containerd 2.2.2#
Verify the installed version on each GPU worker node:
$ containerd --version
Example Output:
containerd containerd.io 2.2.2 ...
If you are running a different version on any worker node, refer to the containerd Getting Started guide for installation instructions.
Helm#
Helm is used to install the NVIDIA GPU Operator and Kata Containers.
Verify that Helm is installed on the system you use for cluster administration:
$ helm version
Example Output:
version.BuildInfo{Version:"v3.14.0", GitCommit:"...", GitTreeState:"clean", GoVersion:"go1.21.6"}
Your exact version details may vary.
If Helm is not installed or the command is not found, refer to the Helm documentation for installation instructions.
Kubelet Configured#
On GPU worker nodes, the kubelet configuration (typically /var/lib/kubelet/config.yaml) must include the required feature gates and an extended image pull timeout.
Confidential Containers require these kubelet feature gates:
KubeletPodResourcesGet: Allows the Kata runtime to query the kubelet Pod Resources API and discover GPUs allocated to a sandbox.RuntimeClassInImageCriApi: Alpha since Kubernetes v1.29; required for pods that use multiple snapshotters side by side.
On Kubernetes v1.34 and later, KubeletPodResourcesGet is enabled by default.
On versions before v1.34, enable it explicitly.
RuntimeClassInImageCriApi must be enabled explicitly on all supported versions.
Increase the runtimeRequestTimeout from the 2-minute default to 20m to avoid timeouts when pulling large GPU workload images.
If a pull exceeds the timeout before the container is running, the kubelet de-allocates the pod.
Actual pull duration varies with image size and network throughput, so this guide uses 20 minutes as a conservative ceiling that accommodates most workload images.
Apply these settings as follows:
Open the kubelet configuration file:
$ sudo nano /var/lib/kubelet/config.yamlThis is typically located at
/var/lib/kubelet/config.yaml, but your configuration file may be in a different location.Add the required settings to the kubelet configuration file. Select the tab that matches your Kubernetes version:
apiVersion: kubelet.config.k8s.io/v1beta1 kind: KubeletConfiguration featureGates: RuntimeClassInImageCriApi: true runtimeRequestTimeout: 20m
apiVersion: kubelet.config.k8s.io/v1beta1 kind: KubeletConfiguration featureGates: KubeletPodResourcesGet: true RuntimeClassInImageCriApi: true runtimeRequestTimeout: 20m
If your kubelet configuration already defines
featureGatesorruntimeRequestTimeout, merge these settings into the existing file instead of replacing it.Restart the kubelet service:
$ sudo systemctl restart kubelet
Note
If you need a timeout longer than 1200 seconds (20 minutes), also adjust the Kata Agent image_pull_timeout.
This setting controls the Confidential Data Hub image pull API timeout in seconds.
Add the agent.image_pull_timeout kernel parameter to your shim configuration, or pass a value in the pod annotation io.katacontainers.config.hypervisor.kernel_params.
Next Steps#
After completing the prerequisites, proceed to Quickstart Install for a minimal install, or Detailed Install Guide for full configuration details.