Managing the Confidential Computing Mode#
As a Kubernetes Cluster Administrator, use this page to configure the confidential computing mode of NVIDIA GPUs in your cluster.
After installing the NVIDIA GPU Operator, you can use the GPU Operator to configure the confidential computing mode of the NVIDIA GPUs in your cluster. You can set a cluster-wide default Confidential Computing mode, and you can set the mode on individual nodes.
Set the cluster-wide default mode using the ccManager.defaultMode=<on|off> option.
The default value of ccManager.defaultMode is on.
Set a node-level mode by applying the nvidia.com/cc.mode=<on|off|ppcie> label on the node.
If you set a specific mode on a node, it has higher precedence than the cluster-wide default mode.
The supported modes are:
Mode |
Description |
Configuration Method |
|---|---|---|
|
Enable Confidential Computing. |
cluster-wide default, node-level override |
|
Disable Confidential Computing. |
cluster-wide default, node-level override |
|
Enable Confidential Computing on NVIDIA Hopper GPUs.
On the NVIDIA Hopper architecture, multi-GPU passthrough
uses protected PCIe (PPCIE), which claims exclusive use of the NVSwitches for a single
Confidential Container virtual machine.
If you use NVIDIA Hopper GPUs for multi-GPU passthrough, set the mode to |
node-level override |
When you change the mode, the manager performs the following actions:
Evicts the other GPU Operator operands from the node. However, the manager does not drain user workloads. You must make sure that no user workloads are running on the node before you change the mode.
Changes the mode and resets the GPU.
Reschedules the other GPU Operator operands.
Setting a Cluster-Wide Default Mode#
Note
Before changing the mode, make sure that no user workloads are running on the node.
To set a cluster-wide mode, specify the ccManager.defaultMode field like the following example:
$ kubectl patch clusterpolicies.nvidia.com/cluster-policy \
--type=merge \
-p '{"spec": {"ccManager": {"defaultMode": "on"}}}'
Example Output:
clusterpolicy.nvidia.com/cluster-policy patched
Note
The ppcie mode cannot be set as a cluster-wide default, it can only be set as a node label value.
Setting a Node-Level Mode#
To set a node-level mode, apply the nvidia.com/cc.mode=<on|off|ppcie> label on the node.
Set the NODE_NAME environment variable to the name of the node you want to configure:
$ export NODE_NAME="<node-name>"
Then apply the label:
$ kubectl label node $NODE_NAME nvidia.com/cc.mode=on --overwrite
The mode that you set on a node has higher precedence than the cluster-wide default mode.
Verifying a Mode Change#
To verify that a mode change was successful, view the nvidia.com/cc.mode,
nvidia.com/cc.mode.state, and nvidia.com/cc.ready.state node labels:
$ kubectl get node $NODE_NAME -o json | \
jq '.metadata.labels | with_entries(select(.key | startswith("nvidia.com/cc")))'
Example Output (CC mode disabled):
{
"nvidia.com/cc.mode": "off",
"nvidia.com/cc.mode.state": "off",
"nvidia.com/cc.ready.state": "false"
}
Example Output (CC mode enabled):
{
"nvidia.com/cc.mode": "on",
"nvidia.com/cc.mode.state": "on",
"nvidia.com/cc.ready.state": "true"
}
When you disable CC mode after enabling it, wait one to two minutes for
nvidia.com/cc.mode.state and nvidia.com/cc.ready.state to match the desired off state.
A mode change is complete and successful when nvidia.com/cc.mode and
nvidia.com/cc.mode.state have the same value.
If nvidia.com/cc.mode.state does not match nvidia.com/cc.mode, refer to nvidia.com/cc.mode.state Not Matching nvidia.com/cc.mode in the troubleshooting guide.
If nvidia.com/cc.mode.state is failed, refer to nvidia.com/cc.mode.state is failed.
Understanding Confidential Computing Mode Labels#
The following labels are used to manage the Confidential Computing mode on a node.
You only need to update the nvidia.com/cc.mode label, the other labels are managed by the Confidential Computing Manager to represent the current state of the Confidential Computing mode on the node.
Label Name |
Label Values |
Details |
|---|---|---|
|
|
The desired Confidential Computing mode. You update this node label to trigger a mode change. |
|
|
Reflects the mode that was last successfully applied to the GPU hardware by the Confidential Computing Manager.
Its value mirrors the applied mode after the transition is complete on the node.
A value of |
|
|
Indicates whether the node is ready to run Confidential Container workloads.
Set to |
Note
The ppcie mode is only supported on NVIDIA Hopper GPUs.