Quickstart¶
Note that the features described below are currently beta.
Table of Contents
Description and Requirements¶
NVIDIA driver container images are available through the NVIDIA public hub repository.
It allows the provisioning of the NVIDIA driver through the use of containers which provides several benefits over a standard driver installation, for example:
Ease of deployment
Fast installation
Reproducibility
For more information about its internals, check out this presentation.
The list of prerequisites for running a driver container is described below.
#. Ubuntu 16.04, Ubuntu 18.04 or Centos 7 with the IPMI driver enabled and the Nouveau driver disabled
#. NVIDIA GPU with Architecture > Fermi (2.1)
#. A supported version of Docker
#. The NVIDIA Container Runtime for Docker) configured with the root
option
#. If you are running Ubuntu 18.04 with an AWS kernel, you also need to enable the i2c_core
kernel module
Configuration¶
You will need to update the NVIDIA Container Toolkit config file so that the root
directive points to the driver container:
disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
[nvidia-container-cli]
root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
#no-cgroups = false
#user = "root:video"
ldconfig = "@/sbin/ldconfig.real"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
Examples¶
# Run the driver container for Ubuntu 16.04 LTS in interactive mode
docker run -it --name nvidia-driver --privileged --pid=host -v /run/nvidia:/run/nvidia:shared \
nvidia/driver:396.37-ubuntu16.04
# Run the driver container for Ubuntu 16.04 AWS in detached mode
docker run -d --name nvidia-driver --privileged --pid=host -v /run/nvidia:/run/nvidia:shared \
nvidia/driver:396.37-ubuntu16.04-aws --accept-license
# Run the driver container for Ubuntu 16.04 HWE in detached mode with
# auto-restarts and auto-detection of kernel updates (aka DKMS)
docker run -d --name nvidia-driver --privileged --pid=host -v /run/nvidia:/run/nvidia:shared \
--restart=unless-stopped -v /etc/kernel/postinst.d:/run/kernel/postinst.d \
nvidia/driver:396.37-ubuntu16.04-hwe --accept-license
# Run the driver container for Centos 7 in detached mode and check its logs
docker run -d --name nvidia-driver --privileged --pid=host -v /run/nvidia:/run/nvidia:shared \
nvidia/driver:396.37-centos7 --accept-license
docker logs -f nvidia-driver
# Build a custom driver container image for Centos 7 with the current kernel
docker build -t nvidia-driver:centos7 --build-arg KERNEL_VERSION=$(uname -r) \
https://gitlab.com/nvidia/driver.git#centos7
# Perform a driver update ahead of time for a given kernel version
docker exec nvidia-driver nvidia-driver update --kernel 4.15.0-23
Quickstart¶
Ubuntu Distributions¶
curl https://get.docker.com | sudo CHANNEL=stable sh
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list \
| sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-docker2
sudo sed -i 's/^#root/root/' /etc/nvidia-container-runtime/config.toml
sudo tee /etc/modules-load.d/ipmi.conf <<< "ipmi_msghandler"
sudo tee /etc/modprobe.d/blacklist-nouveau.conf <<< "blacklist nouveau"
sudo tee -a /etc/modprobe.d/blacklist-nouveau.conf <<< "options nouveau modeset=0"
# If you are running with an AWS kernel
sudo tee /etc/modules-load.d/ipmi.conf <<< "i2c_core"
sudo update-initramfs -u
# Optionally, if the kernel is not up to date
# sudo apt-get dist-upgrade
sudo reboot
sudo docker run -d --privileged --pid=host -v /run/nvidia:/run/nvidia:shared \
--restart=unless-stopped nvidia/driver:418.40.04-ubuntu18.04 --accept-license
sudo docker run --rm --runtime=nvidia nvidia/cuda:9.2-base nvidia-smi
Centos Distributions¶
curl https://get.docker.com | sudo CHANNEL=stable sh
sudo systemctl enable docker
curl -s -L https://nvidia.github.io/nvidia-docker/centos7/nvidia-docker.repo \
| sudo tee /etc/yum.repos.d/nvidia-docker.repo
sudo yum install -y nvidia-docker2
sudo sed -i 's/^#root/root/' /etc/nvidia-container-runtime/config.toml
sudo tee /etc/modules-load.d/ipmi.conf <<< "ipmi_msghandler"
sudo tee /etc/modprobe.d/blacklist-nouveau.conf <<< "blacklist nouveau"
sudo tee -a /etc/modprobe.d/blacklist-nouveau.conf <<< "options nouveau modeset=0"
# Optionally, if the kernel is not up to date
# sudo yum update
sudo reboot
sudo docker run -d --privileged --pid=host -v /run/nvidia:/run/nvidia:shared \
--restart=unless-stopped nvidia/driver:396.37-centos7 --accept-license
sudo docker run --rm --runtime=nvidia nvidia/cuda:9.2-base nvidia-smi
Kubernetes with dockerd¶
Install nvidia-docker2
and modify /etc/nvidia-container-runtime/config.toml
as mentioned above.
You also need to set the default docker runtime to ``nvidia` <https://github.com/nvidia/nvidia-container-runtime#docker-engine-setup>`_.
# If running on bare-metal
kubectl create -f https://gitlab.com/nvidia/samples/raw/master/driver/ubuntu16.04/kubernetes/nvidia-driver.yml
# If running on AWS
kubectl create -f https://gitlab.com/nvidia/samples/raw/master/driver/ubuntu16.04/kubernetes/nvidia-driver-aws.yml
You can now deploy the NVIDIA device plugin.
Deleting the pod will unload the NVIDIA driver from the machine:
kubectl delete daemonset.apps/nvidia-driver-daemonset
Tags available¶
Check the DockerHub