Multi-Process Service in Kubernetes

About Multi-Process Service

NVIDIA Multi-Process Service (MPS) provides the ability to share a GPU with multiple containers.

The NVIDIA GPU Operator enables configuring MPS on a node by using options for the NVIDIA Kubernetes Device Plugin. Using MPS, you can configure the number of replicas to create for each GPU on a node. Each replica is allocatable by the kubelet to a container.

You can apply a cluster-wide default MPS configuration and you can apply node-specific configurations. For example, a cluster-wide configuration could create two replicas for each GPU on each node. A node-specific configuration could be to create two replicas on some nodes and four replicas on other nodes.

You can combine the two approaches by applying a cluster-wide default configuration and then label nodes so that those nodes receive a node-specific configuration.

Refer to Comparison: Time-Slicing, Multi-Process Service, and Multi-Instance GPU for information about the available GPU sharing technologies.

Support Platforms and Resource Types

MPS is supported on bare-metal applications, virtual machines with GPU passthrough, and virtual machines with NVIDIA vGPU.

The only supported resource type is nvidia.com/gpu.

Limitations

  • DCGM-Exporter does not support associating metrics to containers when MPS is enabled with the NVIDIA Kubernetes Device Plugin.

  • The Operator does not monitor changes to the config map that configures the device plugin.

  • The maximum number of replicas that you can request is 16 for pre-Volta devices and 48 for newer devices.

  • MPS is not supported on GPU instances from Multi-Instance GPU (MIG) devices.

  • MPS does not support requesting more than one GPU device. Only one device resource request is supported:

    ...
      spec:
        containers:
          resources:
            limits:
              nvidia.com/gpu: 1
    

Changes to Node Labels

In addition to the standard node labels that GPU Feature Discovery (GFD) applies to nodes, the following label is also applied after you configure MPS for a node:

nvidia.com/<resource-name>.replicas = <replicas-count>

Where <replicas-count> is the factor by which each resource of <resource-name> is equally divided.

Additionally, by default, the nvidia.com/<resource-name>.product label is modified:

nvidia.com/<resource-name>.product = <product-name>-SHARED

For example, on an NVIDIA DGX A100 machine, depending on the MPS configuration, the labels can be similar to the following example:

nvidia.com/gpu.replicas = 8
nvidia.com/gpu.product = A100-SXM4-40GB-SHARED

Using these labels, you can request access to a GPU replica or exclusive access to a GPU in the same way that you traditionally specify a node selector to request one GPU model over another. The -SHARED product name suffix ensures that you can specify a node selector to assign pods to nodes with GPU replicas.

The migStrategy configuration option has an effect on the node label for the product name. When renameByDefault=false, the default value, and migStrategy=single, both the MIG profile name and the -SHARED suffix are appended to the product name, such as the following example:

nvidia.com/gpu.product = A100-SXM4-40GB-MIG-1g.5gb-SHARED

If you set renameByDefault=true, then the value of the nvidia.com/gpu.product node label is not modified.

Configuration

About Configuring Multi-Process Service

You configure Multi-Process Service (MPS) by performing the following high-level steps:

  • Add a config map to the namespace that is used by the GPU Operator.

  • Configure the cluster policy so that the device plugin uses the config map.

  • Apply a label to the nodes that you want to configure for MPS.

On a machine with one GPU, the following config map configures Kubernetes so that the node advertises either two or four GPU resources.

Sample Config Map

apiVersion: v1
kind: ConfigMap
metadata:
  name: mps-config-all
data:
  mps-any: |-
    version: v1
    sharing:
      mps:
        resources:
        - name: nvidia.com/gpu
          replicas: 4

The following table describes the key fields in the config map.

Field

Type

Description

data.<key>

string

Specifies the time-slicing configuration name.

You can specify multiple configurations if you want to assign node-specific configurations. In the preceding example, the values for key are mps-two and mps-four.

flags.migStrategy

string

Specifies how to label MIG devices for the nodes that receive the MPS configuration. Specify one of none, single, or mixed.

The default value is none.

renameByDefault

boolean

When set to true, each resource is advertised under the name <resource-name>.shared instead of <resource-name>.

For example, if this field is set to true and the resource is typically nvidia.com/gpu, the nodes that are configured for MPS then advertise the resource as nvidia.com/gpu.shared. Setting this field to true can be helpful if you want to schedule pods on GPUs with shared access by specifying <resource-name>.shared in the resource request.

When this field is set to false, the advertised resource name, such as nvidia.com/gpu, is not modified. However, the label for the product name is suffixed with -SHARED. For example, if the output of kubectl describe node shows the node label nvidia.com/gpu.product=Tesla-T4, then after the node is configured for MPS, the label becomes nvidia.com/gpu.product=Tesla-T4-SHARED. In this case, you can specify a node selector that includes the -SHARED suffix to schedule pods on GPUs with shared access.

The default value is false.

failRequestsGreaterThanOne

boolean

This field is used with time-slicing GPUs and is ignored for MPS.

For MPS, resource requests for GPUs must be set to 1. Refer to the manifest examples or Limitations.

resources.name

string

Specifies the resource type to make available with MPS, nvidia.com/gpu.

resources.replicas

integer

Specifies the number of MPS GPU replicas to make available for shared access to GPUs of the specified resource type.

Applying One Cluster-Wide Configuration

Perform the following steps to configure GPU sharing with MPS if you already installed the GPU operator and want to apply the same MPS configuration on all nodes in the cluster.

  1. Create a file, such as mps-config-all.yaml, with contents like the following example:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: mps-config-all
    data:
      mps-any: |-
        version: v1
        sharing:
          mps:
            resources:
            - name: nvidia.com/gpu
              replicas: 4
    
  2. Add the config map to the same namespace as the GPU operator:

    $ kubectl create -n gpu-operator -f mps-config-all.yaml
    
  3. Configure the device plugin with the config map and set the default GPU sharing configuration:

    $ kubectl patch clusterpolicies.nvidia.com/cluster-policy \
        -n gpu-operator --type merge \
        -p '{"spec": {"devicePlugin": {"config": {"name": "mps-config-all", "default": "mps-any"}}}}'
    
  4. Optional: Confirm that the gpu-feature-discovery and nvidia-device-plugin-daemonset pods restart:

    $ kubectl get events -n gpu-operator --sort-by='.lastTimestamp'
    

    Example Output

    LAST SEEN   TYPE     REASON             OBJECT                                              MESSAGE                                                                               
    38s         Normal   SuccessfulDelete   daemonset/nvidia-device-plugin-daemonset            Deleted pod: nvidia-device-plugin-daemonset-l86fw                                     
    38s         Normal   SuccessfulDelete   daemonset/gpu-feature-discovery                     Deleted pod: gpu-feature-discovery-shj2m
    38s         Normal   Killing            pod/gpu-feature-discovery-shj2m                     Stopping container gpu-feature-discovery                                              
    38s         Normal   Killing            pod/nvidia-device-plugin-daemonset-l86fw            Stopping container nvidia-device-plugin
    37s         Normal   Scheduled          pod/nvidia-device-plugin-daemonset-lcklx            Successfully assigned gpu-operator/nvidia-device-plugin-daemonset-lcklx to worker-1
    37s         Normal   SuccessfulCreate   daemonset/gpu-feature-discovery                     Created pod: gpu-feature-discovery-pgx9l
    37s         Normal   Scheduled          pod/gpu-feature-discovery-pgx9l                     Successfully assigned gpu-operator/gpu-feature-discovery-pgx9l to worker-0            
    37s         Normal   SuccessfulCreate   daemonset/nvidia-device-plugin-daemonset            Created pod: nvidia-device-plugin-daemonset-lcklx                                     
    36s         Normal   Created            pod/nvidia-device-plugin-daemonset-lcklx            Created container config-manager-init                                                 
    36s         Normal   Pulled             pod/nvidia-device-plugin-daemonset-lcklx            Container image "nvcr.io/nvidia/cloud-native/gpu-operator-validator:v24.3.0" already present on machine 
    
  5. Optional: After a few minutes, confirm that the Operator starts an MPS control daemon pod for each node in the cluster that has a GPU.

    $ kubectl get pods -n gpu-operator -l app=nvidia-device-plugin-mps-control-daemon
    

    Example Output

    NAME                                            READY   STATUS    RESTARTS   AGE
    nvidia-device-plugin-mps-control-daemon-9pq7z   2/2     Running   0          4m20s
    nvidia-device-plugin-mps-control-daemon-kbwgp   2/2     Running   0          4m20s
    

Refer to Verifying the MPS Configuration.

Applying Multiple Node-Specific Configurations

An alternative to applying one cluster-wide configuration is to specify multiple MPS configurations in the config map and to apply labels node-by-node to control which configuration is applied to which nodes.

  1. Create a file, such as mps-config-fine.yaml, with contents like the following example:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: mps-config-fine
    data:
      mps-four: |-
        version: v1
        sharing:
          mps:
            renameByDefault: false
            resources:
            - name: nvidia.com/gpu
              replicas: 4
      mps-two: |-
        version: v1
        sharing:
          mps:
            renameByDefault: false
            resources:
            - name: nvidia.com/gpu
              replicas: 2
    
  2. Add the config map to the same namespace as the GPU operator:

    $ kubectl create -n gpu-operator -f mps-config-fine.yaml
    
  3. Configure the device plugin with the config map:

    $ kubectl patch clusterpolicies.nvidia.com/cluster-policy \
        -n gpu-operator --type merge \
        -p '{"spec": {"devicePlugin": {"config": {"name": "mps-config-fine"}}}}'
    

    Because the specification does not include the devicePlugin.config.default field, when the device plugin pods redeploy, they do not automatically apply the MPS configuration to all nodes.

  4. Optional: Confirm that the gpu-feature-discovery and nvidia-device-plugin-daemonset pods restart.

    $ kubectl get events -n gpu-operator --sort-by='.lastTimestamp'
    

    Example Output

    LAST SEEN   TYPE     REASON             OBJECT                                              MESSAGE                                                                               
    38s         Normal   SuccessfulDelete   daemonset/nvidia-device-plugin-daemonset            Deleted pod: nvidia-device-plugin-daemonset-l86fw                                     
    38s         Normal   SuccessfulDelete   daemonset/gpu-feature-discovery                     Deleted pod: gpu-feature-discovery-shj2m
    38s         Normal   Killing            pod/gpu-feature-discovery-shj2m                     Stopping container gpu-feature-discovery                                              
    38s         Normal   Killing            pod/nvidia-device-plugin-daemonset-l86fw            Stopping container nvidia-device-plugin
    37s         Normal   Scheduled          pod/nvidia-device-plugin-daemonset-lcklx            Successfully assigned gpu-operator/nvidia-device-plugin-daemonset-lcklx to worker-1
    37s         Normal   SuccessfulCreate   daemonset/gpu-feature-discovery                     Created pod: gpu-feature-discovery-pgx9l
    37s         Normal   Scheduled          pod/gpu-feature-discovery-pgx9l                     Successfully assigned gpu-operator/gpu-feature-discovery-pgx9l to worker-0            
    37s         Normal   SuccessfulCreate   daemonset/nvidia-device-plugin-daemonset            Created pod: nvidia-device-plugin-daemonset-lcklx                                     
    36s         Normal   Created            pod/nvidia-device-plugin-daemonset-lcklx            Created container config-manager-init                                                 
    36s         Normal   Pulled             pod/nvidia-device-plugin-daemonset-lcklx            Container image "nvcr.io/nvidia/cloud-native/gpu-operator-validator:v24.3.0" already present on machine 
    
  5. Optional: After a few minutes, confirm that the Operator starts an MPS control daemon pod for each node in the cluster that has a GPU.

    $ kubectl get pods -n gpu-operator -l app=nvidia-device-plugin-mps-control-daemon
    

    Example Output

    NAME                                            READY   STATUS    RESTARTS   AGE
    nvidia-device-plugin-mps-control-daemon-9pq7z   2/2     Running   0          4m20s
    nvidia-device-plugin-mps-control-daemon-kbwgp   2/2     Running   0          4m20s
    
  6. Apply a label to the nodes by running one or more of the following commands:

    • Apply a label to nodes one-by-one by specifying the node name:

      $ kubectl label node <node-name> nvidia.com/device-plugin.config=mps-two
      
    • Apply a label to several nodes at one time by specifying a label selector:

      $ kubectl label node \
          --selector=nvidia.com/gpu.product=Tesla-T4 \
          nvidia.com/device-plugin.config=mps-two
      

Refer to Verifying the MPS Configuration.

Configuring Multi-Process Server Before Installing the NVIDIA GPU Operator

You can enable MPS with the NVIDIA GPU Operator by passing the devicePlugin.config.name=<config-map-name> parameter during installation.

Perform the following steps to configure MPS before installing the Operator:

  1. Create the namespace for the Operator:

    $ kubectl create namespace gpu-operator
    
  2. Create a file, such as mps-config.yaml, with the config map contents.

    Refer to the Applying One Cluster-Wide Configuration or Applying Multiple Node-Specific Configurations sections.

  3. Add the config map to the same namespace as the Operator:

    $ kubectl create -f mps-config.yaml -n gpu-operator
    
  4. Install the operator with Helm:

    $ helm install gpu-operator nvidia/gpu-operator \
        -n gpu-operator \
        --set devicePlugin.config.name=mps-config
    
  5. Refer to either Applying One Cluster-Wide Configuration or Applying Multiple Node-Specific Configurations and perform the following tasks:

    • Configure the device plugin by running the kubectl patch command.

    • Apply labels to nodes if you added a config map with node-specific configurations.

After installation, refer to Verifying the MPS Configuration.

Updating an MPS Config Map

The Operator does not monitor the config map with the MPS configuration. As a result, if you modify a config map, the device plugin pods do not restart and do not apply the modified configuration.

  1. To apply the modified config map, manually restart the device plugin pods:

    $ kubectl rollout restart -n gpu-operator daemonset/nvidia-device-plugin-daemonset
    
  2. Manually restart the MPS control daemon pods:

    $ kubectl rollout restart -n gpu-operator daemonset/nvidia-device-plugin-mps-control-daemon
    

Currently running workloads are not affected and continue to run, though NVIDIA recommends performing the restart during a maintenance period.

Verifying the MPS Configuration

Perform the following steps to verify that the MPS configuration is applied successfully:

  1. Confirm that the node advertises additional GPU resources:

    $ kubectl describe node <node-name>
    

    Example Output

    The example output varies according to the GPU in your node and the configuration that you apply.

    The following output applies when renameByDefault is set to false, the default value. The key considerations are as follows:

    • The nvidia.com/gpu.count label reports the number of physical GPUs in the machine.

    • The nvidia.com/gpu.product label includes a -SHARED suffix to the product name.

    • The nvidia.com/gpu.replicas label matches the reported capacity.

    • The nvidia.com/gpu.sharing-strategy label is set to mps.

    ...
    Labels:
                      nvidia.com/gpu.count=4
                      nvidia.com/gpu.product=Tesla-T4-SHARED
                      nvidia.com/gpu.replicas=4
                      nvidia.com/gpu.sharing-strategy=mps
    Capacity:
      nvidia.com/gpu: 16
      ...
    Allocatable:
      nvidia.com/gpu: 16
      ...
    

    The following output applies when renameByDefault is set to true. The key considerations are as follows:

    • The nvidia.com/gpu.count label reports the number of physical GPUs in the machine.

    • The nvidia.com/gpu capacity reports 0.

    • The nvidia.com/gpu.shared capacity equals the number of physical GPUs multiplied by the specified number of GPU replicas to create.

    • The nvidia.com/gpu.sharing-strategy label is set to mps.

    ...
    Labels:
                      nvidia.com/gpu.count=4
                      nvidia.com/gpu.product=Tesla-T4
                      nvidia.com/gpu.replicas=4
                      nvidia.com/gpu.sharing-strategy=mps
    Capacity:
      nvidia.com/gpu:        0
      nvidia.com/gpu.shared: 16
      ...
    Allocatable:
      nvidia.com/gpu:        0
      nvidia.com/gpu.shared: 16
      ...
    
  2. Optional: Deploy a workload to validate GPU sharing:

    • Create a file, such as mps-verification.yaml, with contents like the following:

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: mps-verification
        labels:
          app: mps-verification
      spec:
        replicas: 5
        selector:
          matchLabels:
            app: mps-verification
        template:
          metadata:
            labels:
              app: mps-verification
          spec:
            tolerations:
              - key: nvidia.com/gpu
                operator: Exists
                effect: NoSchedule
            hostPID: true
            containers:
              - name: cuda-sample-vector-add
                image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04"
                command: ["/bin/bash", "-c", "--"]
                args:
                  - while true; do /cuda-samples/vectorAdd; done
                resources:
                 limits:
                   nvidia.com/gpu: 1
            nodeSelector:
              nvidia.com/gpu.sharing-strategy: mps
      
    • Create the deployment with multiple replicas:

      $ kubectl apply -f mps-verification.yaml
      
    • Verify that all five replicas are running:

      $ kubectl get pods
      

      Example Output

      NAME                                READY   STATUS    RESTARTS   AGE
      mps-verification-86c99b5666-hczcn   1/1     Running   0          3s
      mps-verification-86c99b5666-sj8z5   1/1     Running   0          3s
      mps-verification-86c99b5666-tnjwx   1/1     Running   0          3s
      mps-verification-86c99b5666-82hxj   1/1     Running   0          3s
      mps-verification-86c99b5666-9lhh6   1/1     Running   0          3s
      
    • View the logs from one of the pods:

      $ kubectl logs deploy/mps-verification
      

      Example Output

      Found 5 pods, using pod/mps-verification-86c99b5666-tnjwx
      [Vector addition of 50000 elements]
      Copy input data from the host memory to the CUDA device
      CUDA kernel launch with 196 blocks of 256 threads
      Copy output data from the CUDA device to the host memory
      Test PASSED
      Done
      [Vector addition of 50000 elements]
      Copy input data from the host memory to the CUDA device
      CUDA kernel launch with 196 blocks of 256 threads
      Copy output data from the CUDA device to the host memory
      Test PASSED
      ...
      
    • View the default active thread percentage from one of the pods:

      $ kubectl exec deploy/mps-verification -- bash -c "echo get_default_active_thread_percentage | nvidia-cuda-mps-control"
      

      Example Output

      25.0
      
    • View the default pinned memory limit from one of the pods:

      $ kubectl exec deploy/mps-verification -- bash -c "echo get_default_device_pinned_mem_limit | nvidia-cuda-mps-control"
      

      Example Output

      3G
      
    • Stop the deployment:

      $ kubectl delete -f mps-verification.yaml
      

      Example Output

      deployment.apps "mps-verification" deleted
      

References