Run a Sample Workload#

As a Kubernetes Cluster Administrator, use this page to verify your installation and run a sample workload. Container User personas can also run the sample workload to confirm the cluster is ready before deploying applications. For persona responsibilities and documentation structure, refer to Personas.

Verify your Confidential Container setup by running a basic single-GPU sample workload inside a Confidential Container.

This page assumes that you have completed Prerequisites and either Quickstart Install or Detailed Install Guide. Your cluster should have kata-qemu-nvidia-gpu-snp and kata-qemu-nvidia-gpu-tdx runtime classes installed, and GPU Operator operands (including the Confidential Computing Manager, Kata Sandbox Device Plugin, and VFIO Manager) running on your nodes.

This page intentionally uses the simplest possible manifest so that you can confirm the deployment end-to-end. It is not a production workload template. For runtime class selection, resource type naming, multi-GPU passthrough, and additional manifest patterns, refer to Configuring Workloads.

  1. Create a file named cuda-vectoradd-kata.yaml with a sample manifest for your system:

    apiVersion: v1
    kind: Pod
    metadata:
      name: cuda-vectoradd-kata
      namespace: default
    spec:
      runtimeClassName: kata-qemu-nvidia-gpu-snp
      restartPolicy: Never
      containers:
        - name: cuda-vectoradd
          image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04"
          resources:
            limits:
              nvidia.com/pgpu: "1" # for single GPU passthrough
              memory: 16Gi
    
    apiVersion: v1
    kind: Pod
    metadata:
      name: cuda-vectoradd-kata
      namespace: default
    spec:
      runtimeClassName: kata-qemu-nvidia-gpu-tdx
      restartPolicy: Never
      containers:
        - name: cuda-vectoradd
          image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04"
          resources:
            limits:
              nvidia.com/pgpu: "1" # for single GPU passthrough
              memory: 16Gi
    

    The following is a brief list of the options available for the manifest:

    • Runtime class: Use kata-qemu-nvidia-gpu-snp on AMD-based systems or kata-qemu-nvidia-gpu-tdx on Intel-based systems.

    • GPU resource type: The sample requests nvidia.com/pgpu, which is the default resource name advertised by the NVIDIA Kata Sandbox Device Plugin. If your cluster was installed with the P_GPU_ALIAS="" setting, replace it with the model-specific name advertised on your node, for example nvidia.com/GH100_H200_141GB.

    Refer to Configuring Confidential Container Workloads for additional guidance on each option.

  2. Create the pod:

    $ kubectl apply -f cuda-vectoradd-kata.yaml
    

    Example Output:

    pod/cuda-vectoradd-kata created
    
  3. Verify the pod is running:

    $ kubectl get pod cuda-vectoradd-kata
    

    Example Output:

    NAME                  READY   STATUS    RESTARTS   AGE
    cuda-vectoradd-kata   1/1     Running   0          10s
    

    The pod could also say Completed if the container already completed successfully.

    If the pod stays Pending for more than a few minutes, refer to Pod Stuck in Pending State with Insufficient nvidia.com/pgpu Error in Troubleshooting before continuing.

  4. View the logs from the pod after the container starts:

    $ kubectl logs -n default cuda-vectoradd-kata
    

    Example Output:

      [Vector addition of 50000 elements]
      Copy input data from the host memory to the CUDA device
      CUDA kernel launch with 196 blocks of 256 threads
      Copy output data from the CUDA device to the host memory
      Test PASSED
      Done
    
    The output should include ``Test PASSED`` if the container completed successfully.
    
    If you do not see any log output, make sure the pod is running and the container is started.
    
  5. Delete the pod:

    $ kubectl delete -f cuda-vectoradd-kata.yaml
    

Next Steps#

  • Refer to the Advanced Setup Overview section for more information on managing the Confidential Computing mode and configuring workloads.