A Practical Guide to Running NVIDIA GPUs on Kubernetes

Target audience: Kubernetes Operators, ML Engineers, GPU Nerds

Exploring the power of GPUs in the cloud sparked my interest in integrating a local NVIDIA GPU with my Kubernetes homelab cluster.

Adding a GPU to Kubernetes allows me to run Jupyter Notebooks and AI/ML workloads. The biggest benefit to this approach is portability; the same notebooks and models running locally are easily reproduced in the cloud.

This topic was confusing to me and I had to rely on information from various vendors, GitHub issues, and stack overflow posts.

I aim to demystify the process, offering a clear path to harnessing GPU acceleration for AI/ML workloads right from your own setup.

Scope

If you’re following along:

You have a node running Ubuntu 22.04 LTS
You have an NVIDIA GPU connected to the node
Kubernetes installed and running

Unless otherwise stated, all commands should run on the above node.

Overview of components

Let’s break each step of GPU connection path into larger components (pod/workload → kubernetes → container runtime → software → hardware → GPU).

I’ll cover each component from top to bottom, and then use the reverse order of “NEEDS” to set up and validate my GPU-accelerated Kubernetes homelab.

The diagram below visualizes the GPU connection path within a Kubernetes setup:

Starting with the pod/workload, the container should include software (like CUDA) to utilize GPU hardware. We can assume the container automatically gets a GPU with drivers but you still need to supply the SDK/API “on top.” The NVIDIA container runtime hook provides the container GPU device configuration.

How does Kubernetes know which pods need GPU(s)?

For my Kubernetes setup, I declare GPUs in the pod/workload via a combination of spec.runtimeClassName (runtime class docs), spec.containers.resources (resource quota docs), and spec.nodeSelector (nodeSelector docs). For example:

spec:
  runtimeClassName: nvidia   #<--- USE THE NVIDIA CONTAINER RUNTIME
  containers:
    resources:
      limits:
        nvidia.com/gpu: 1    #<-- ASSIGN 1 GPU, IF MULTIPLE
  nodeSelector:              #<--- RUN ON GPU ${NODE_NAME}
    kubernetes.io/hostname: ${NODE_NAME}

It’s also common to see NoSchedule taints on GPU nodes. This is to prevent workloads that don’t explicitly need GPUs from running (taints and toleration docs). To tolerate the NoSchedule taint:

spec:
  tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule

The YAML above samples instruct Kubernetes where / how to run the workload, however, GPUs are considered “extended resources” or “non-Kubernetes-built-in resources” (docs). There must be something that tells Kubernetes that there are X nodes with X GPUs available.

How does Kubernetes know which nodes have GPU(s)?

Many NVIDIA GPU features are auto-managed by the NVIDIA GPU Operator, including a device-plugin-daemonset deployment which informs Kubernetes about device capacity. (NVIDIA k8s-device-plugin docs)

The NVIDIA GPU Operator encompasses:

The (optional) ability to install NVIDIA drivers on the host
The Kubernetes device plugin for GPUs
The (optional) ability to configure NVIDIA Container Runtime on the host
Automatic node labeling
DCGM (Data Center GPU Manager) based monitoring and more

The important part is that the operator automatically labels nodes for selectors and assesses capacity for quotas.

The NVIDIA device plugin is a Daemonset that allows you to automatically:

Expose the number of GPUs on each node of your cluster
Keep track of the health of your GPUs
Run GPU-enabled containers in your Kubernetes cluster

Up to this point, our Kubernetes cluster has scheduled the workload to a GPU-ready node and provided instructions to the container runtime requesting the GPU accelerated nvidia runtimeClass.

How does the `nvidia` runtimeClass expose GPU(s)?

A package named NVIDIA Container Toolkit (docs) provides most of the configuration and binaries.

On the GPU node, the container runtime (containerd) is configured with a wrapper around runc called nvidia-container-runtime (docs).

The wrapper (nvidia-container-runtime) uses a pre-start hook into containerd to add host GPUs via mounts, environment variables, etc.

Think of this like injecting the GPU hardware config into a container but you still need to bring the software (like CUDA)

Below is an example configuration for containerd to use the NVIDIA runtimeClass:

# /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
    privileged_without_host_devices = false
    runtime_engine = ""
    runtime_root = ""
    runtime_type = "io.containerd.runc.v2"
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
      BinaryName = "/usr/bin/nvidia-container-runtime"

The above config is used anytime a container is using the nvidia runtimeClass.

Configuring /etc/containerd/config.toml is automated via nvidia-ctk (a poorly named subset utility of the nvidia-container-toolkit), covered later.

The nvidia-container-toolkit and utilities are responsible for configuring the container runtime, but that process assumes we already have a configured GPU on the host.

How does Ubuntu know it has a GPU?

The short answer is drivers. A driver is the essential software needed for your operating system to communicate with the NVIDIA graphics card.

The NVIDIA drivers are installed on Ubuntu using package managers.

There are 2 parts to the NVIDIA driver, how the hardware knows how to talk to the GPU (hardware/kernel modules) and how the software knows how to talk to the GPU.

I included “CUDA stuff” in the picture because it could be installed on the host, but it depends on the exact use case. It is not required for this walkthrough and discussed later.

How does the motherboard know a GPU is connected?

This is a bit of a trick question. Most, if not all, consumer GPUs are connected via PCIe.

When I thought more about it, PCIe supports GPUs, NVMe, NICs, and many other peripherals. It’s just a way to transmit data.

The motherboard doesn’t need to know it’s a GPU, but it does need to know something is plugged into it via PCIe.

Note

If using a Thunderbolt external GPU (eGPU), it’s still considered PCI. “Thunderbolt combines PCIe and DisplayPort into two serial signals, and additionally provides DC power via a single cable.” (source)

Now that we’re at the bottom of the components, we can follow the reverse order to install & validate a GPU on a local Kubernetes cluster.

Configuring a NVIDIA RTX GPU on Kubernetes

Starting where we left off, let’s check the physical hardware connection.

Validate hardware connection

Use lspci, a utility for displaying information about PCI buses in the system and devices connected to them, to look for known NVIDIA device.

# list all PCI devices with the text NVIDIA
sudo lspci | grep NVIDIA

All good! ✅ Output:

2f:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1)

NVIDIA GPU driver considerations

Not only are there many competing ways to install the same GPU driver, but how do you know which version to use?

Finding the correct driver version

Use the search menu on NVIDIA’s driver download site to find the latest recommended version to install.

For example, searching for an RTX 3060 returns:

Field	Value
Version	535.154.05
Release Date	2024.1.16
Operating System	Linux 64-bit
Language	English (US)
File Size	325.86 MB

That means I’m looking for the 535+ nvidia driver version.

(a side note about CUDA versions)

CUDA is additional software that helps applications run on NVIDIA GPUs. Consider it like an API for your host’s GPU.

While CUDA packages aren’t required for this setup, there’s a semi-delicate relationship between CUDA and driver version used in a container. If there’s a mismatch between CUDA and your driver, things may not work as expected!

Tip

After a driver is installed, nvidia-smi can be ran to check the recommended CUDA version, for example nvidia-driver-535 outputs CUDA 12.2 even though I haven’t installed CUDA.

Most of my problems went away once I had alignment with the CUDA version in the container alongside the matching host drivers. (CUDA download)

Also, fair warning, CUDA adds a significant amount of resources to your container images.

If you are determined to reduce the size of your image, you can selectively rm -rf the parts of the Toolkit that you don’t need, but be careful about not deleting libraries and tools that may be used by applications in the container!

Install the NVIDIA GPU driver

There are a few popular ways to install the NVIDIA GPU driver on Ubuntu 22.04 LTS:

Official Ubuntu managed NVIDIA drivers via ubuntu-drivers install (docs)
Official NVIDIA managed NVIDIA drivers via .run file (download)
Unofficial PPA managed NVIDIA drivers via ppa:graphics-drivers/ppa (docs)

For this walkthrough, I use the last option (ppa) but feel free to substitute in your preferred method. I chose PPA because it seemed to work the easiest.

Add the PPA repo and install the driver found above.

# add ppa:graphics-driver repo to apt
sudo add-apt-repository ppa:graphics-drivers/ppa --yes

# update apt content list
sudo apt update

# install driver
sudo apt install nvidia-driver-535

Warning

I ran into an issue where Ubuntu’s unattended-upgrades automatically updated some of the GPU driver dependencies and broke my GPU configuration.

Fixed with sudo apt remove unattended-upgrades but there are other, less forceful, solutions.

Now that we have the drivers installed, let’s validate they’re working. A quick test would be running nvidia-smi, a utility that provides monitoring and management capabilities for NVIDIA GPUs.

# get the driver version
nvidia-smi --query-gpu=driver_version --format=csv,noheader

Validate the NVIDIA GPU driver

Validate installation by listing all packages (dpkg -l) installed with “nvidia” or “535” in the name.

dpkg -l | grep nvidia
# or
dpkg -l | grep 535
# expected output: non-empty list of packages

All good! ✅

Tip

To prevent unplanned package changes, hold them to prevent auto-upgrading.

# any package with nvidia in the name should be held
dpkg-query -W --showformat='${Package} ${Status}\n' | \
grep -v deinstall | \
awk '{ print $1 }' | \
grep -E 'nvidia.*-[0-9]+$' | \
xargs -r -L 1 sudo apt-mark hold

Output:

#...
libnvidia-fbc1-535 set on hold.
libnvidia-gl-535 set on hold.
nvidia-compute-utils-535 set on hold.
nvidia-dkms-535 set on hold.

This also means sudo apt-mark unhold [package_name] must be ran before upgrading.

Are the kernel modules installed? Is the driver working?

Modules instruct the kernel how to interact with the device attached to it. Without any NVIDIA modules, the OS doesn’t know how to communicate with the hardware.

Use lsmod, a program which lists the contents of the /proc/modules, showing what kernel modules are currently loaded.

# Show the status of driver modules in the Linux Kernel
lsmod | grep nvidia

If you have modules installed it might look like ✅:

nvidia_uvm           1511424  12
nvidia_drm             77824  0
nvidia_modeset       1306624  1 nvidia_drm
nvidia              56692736  200 nvidia_uvm,nvidia_modeset
drm_kms_helper        311296  1 nvidia_drm
drm                   622592  4 drm_kms_helper,nvidia,nvidia_drm

Note

I was testing the above output with an eGPU and the modules weren’t showing up. I thought my understanding was wrong, but it turned out I didn’t plug in the cable.

Connecting the eGPU fixed my issue and the modules appeared.

Check the kernel driver version file:

cat /proc/driver/nvidia/version

All good! ✅ Output:

NVRM version: NVIDIA UNIX x86_64 Kernel Module  535.154.05  Thu Dec 28 15:37:48 UTC 2023
GCC version:  gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)

Check the device file for a found nvidia device:

# any device files (I/O sys calls)
ls /dev/ | grep 'nvidia[0-9]\+'

All good! ✅ Output:

nvidia0

It appears that we have a host with a working GPU setup, next let’s configure containerd to support a GPU runtime.

Install NVIDIA Container Toolkit

My homelab is running Kubernetes v1.28.4 with containerd. As mentioned earlier, we need the NVIDIA Container Toolkit (a set of utilities) to configure containerd to leverage NVIDIA GPU(s).

As far as I know, this installs tools on your host but does not configure, or change, anything by default.

From the “Installing the NVIDIA Container Toolkit” guide.

# add nvidia-container-toolkit repo to apt sources
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# update apt content
sudo apt update

# install container toolkit
sudo apt install -y nvidia-container-toolkit

Note

As of the v1.12.0 release the NVIDIA Container Toolkit includes support for generating Container Device Interface (CDI) specifications - an alternative approach to this blog for adding GPUs to a container runtime.

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html

Configure `containerd`

Now that the tools are installed, we need to update the containerd config runtime classes. Luckily one of the tools, nvidia-ctk can automate that process.

From the “Installing the NVIDIA Container Toolkit” guide.

# options: --dry-run
sudo nvidia-ctk runtime configure --runtime=containerd

# restart containerd
sudo systemctl restart containerd

Verify that containerd is running with sudo systemctl status containerd

Note

You can customize the NVIDIA runtime configuration by specifying the runtime name (--nvidia-runtime-name), the path to the NVIDIA runtime executable (--nvidia-runtime-path), and the path to the NVIDIA Container Runtime hook executable (--nvidia-runtime-hook-path).

There’s also an option to set the NVIDIA runtime as the default runtime using --nvidia-set-as-default. (source)

If you want to dig more into what nvidia-container-runtime does on the host to expose a GPU, I highly recommend reading their low level example in the docs.

If you’re still not tired of this topic, the blog titled “Enabling GPUs in the Container Runtime Ecosystem” from NVIDIA is excellent.

Validate `containerd`

Check that the nvidia runtime exists in the config.

sudo cat /etc/containerd/config.toml | grep "containerd.runtimes.nvidia."

All good! ✅ Output:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia] 
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]

Let’s try to run a container directly on the host (skipping Kubernetes). First we need to Install nerdctl, a drop-in replacement for docker, that allows for us to use the --gpus all argument.

Install nerdctl using a precompiled release.

Note the version of CUDA I’m selecting, check out the repo website for the latest tag options: docker.com/r/nvidia/cuda/tags

# `nvidia-smi` command ran with cuda 12.3
sudo nerdctl run -it --rm --gpus all nvidia/cuda:12.3.1-base-ubuntu20.04 nvidia-smi

# `nvcc -V` command ran with cuda 12.3 (the "12.3.1-base" image doesn't include nvcc)
sudo nerdctl run -it --rm --gpus all nvidia/cuda:12.3.1-devel-ubuntu20.04 nvcc -V

All good! ✅

Note

If you are on a machine with multiple GPUs, you can substitute --gpus all with something like --gpus '"device=0,1"' to test sharing individual GPUs.

# only use device 0 and 1 out of a possible [0,1,2,3] setup
sudo nerdctl run -it --rm --gpus '"device=0,1"' nvidia/cuda:12.2.2-base-ubuntu22.04 nvidia-smi

At this spot, we have a GPU node that works up to the container runtime.

Install the NVIDIA GPU Operator using `helm`

The last piece of the puzzle, we need to let Kubernetes know that we have nodes with GPU’s on ’em.

The NVIIDA GPU Operator creates/configures/manages GPUs atop Kubernetes and is installed with via helm chart.

Install helm following the official instructions. If you’re interested in looking at the helm chart and values here’s the Github repo.

Add the helm repo:

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
   && helm repo update

Install the release on your Kubernetes cluster.

By default, the Operator deploys the NVIDIA Container Toolkit and NVIDIA drivers as a container on the system. We’ll set these values to false since we already installed both components.

helm install --wait gpu-operator \
     -n gpu-operator --create-namespace \
      nvidia/gpu-operator \
      --set driver.enabled=false \
      --set toolkit.enabled=false

Ensure all the pods came up healthy:

# ensure nothing on kubernetes is wonky
kubectl get pods -n gpu-operator | grep -i nvidia

All good! ✅ Output:

nvidia-cuda-validator-4hh2v                                  0/1     Completed   0               3d20h
nvidia-dcgm-exporter-86wcv                                   1/1     Running     5 (7d10h ago)   7d20h
nvidia-device-plugin-daemonset-cxfnc                         1/1     Running     0               26h
nvidia-operator-validator-jhz6j                              1/1     Running     0               3d20h

Validate the GPU Operator

kubectl -n gpu-operator logs deployment/gpu-operator | grep GPU

This isn’t a foolproof test, but you should see Number of nodes with GPU label","NodeCount": NUMBER_OF_EXPECTED_GPU_NODES with an actual value. If it says 0, there’s probably an issue that requires debugging.

Helpful debugging command: kubectl get events -n gpu-operator --sort-by='.lastTimestamp'

Tip

When in doubt (or when the GPU operator pods are stuck in init / terminating on a single node but the underlying setup is sound): reboot the node.

Putting it all together

Finally, let’s run a Kubernetes workload to test that our integration works end to end.

# EXPORT NODE NAME!
export NODE_NAME=node3

cat <<EOF | kubectl create -f -     
apiVersion: batch/v1
kind: Job
metadata:
  name: test-job-gpu
spec:
  template:
    spec:
      runtimeClassName: nvidia
      containers:
      - name: nvidia-test
        image: nvidia/cuda:12.0.0-base-ubuntu22.04
        command: ["nvidia-smi"]
        resources:
          limits:
            nvidia.com/gpu: 1
      nodeSelector:
        kubernetes.io/hostname: ${NODE_NAME}
      restartPolicy: Never
EOF

Check the output via logs:

kubectl logs job/test-job-gpu

Expected output similar to:

Congrats! 🎉🎉🎉 We officially have a local GPU accelerated Kubernetes cluster!

Conclusion

Integrating GPUs into Kubernetes might seem daunting due to the involved technical layers. I hope this guide demystifies the process of integrating NVIDIA GPUs with Kubernetes for you.

In summary, exposing a GPU on k8s includes:

Install NVIDIA GPU drivers (apt install nvidia-driver-535)
Configure container runtime (apt install -y nvidia-container-toolkit & nvidia-ctk runtime configure)
Configure kubernetes (helm install nvidia/gpu-operator)
Update deployment YAML to include GPU requests

In the future, I’d consider using the ubuntu-driver installer and/or having the Kubernetes GPU Operator manage the driver and container toolkit.

If you have any questions, insights, or feedback, feel free to share!

Clean up

Want to start over? Install a different driver? Delete everything:

# drain node / remove from cluster

# remove gpu-operator deployment
helm -n gpu-operator list
helm -n gpu-operator delete HELM_RELEASE_NAME

# delete driver packages
sudo apt remove --purge '^nvidia-.*'
sudo apt remove --purge '^libnvidia-.*'

# clean up the uninstall
sudo apt autoremove

# restart containerd

Bonus: Lazy GKE A100 exploration

I was curious how my current understanding of local NVIDIA GPUs compared to GPU acceleration in the cloud, so I spun up an A100 node on GKE.

I had to deploy the node twice because I made a mistake on my first deployment. I omitted gpu-driver-version=default; so a driver and tools weren’t found (as intended) but I could see the connected PCI device.

There are instructions on manually installing the driver on COS here, but I consider it out of scope.

Here is the command I used to (re)create the node pool:

# create command
gcloud container node-pools create gpu-pool-2 \
  --cluster cluster-2 \
  --region us-central1 \
  --machine-type a2-highgpu-1g \
  --num-nodes 1 \
  --accelerator type=nvidia-tesla-a100,count=1,gpu-driver-version=default

Let’s see what we can find!

# gcloud compute ssh NODE_NAME

# PCI connection?
sudo lspci | grep NVIDIA
00:04.0 3D controller: NVIDIA Corporation GA100 [A100 SXM4 40GB] (rev a1)

# Driver installed?
cat /proc/driver/nvidia/version
#NVRM version: NVIDIA UNIX x86_64 Kernel Module  470.223.02  Sat Oct  7 15:39:11 UTC 2023
#GCC version:  Selected multilib: .;@m64

# tab complete `nvidia-c*`
nvidia-container-runtime       nvidia-container-runtime.cdi   
nvidia-container-runtime-hook  nvidia-ctk

# Where is nvidia-smi?
sudo find / -type f -name "nvidia-smi" 2>/dev/null
# /home/kubernetes/bin/nvidia/bin/nvidia-smi

# Runtime?
sudo cat /etc/containerd/config.toml | grep "containerd.runtimes.nvidia."

# NO!

# But, a quick look around:
# bin k8s container runtime is in the default + device plugin
# it looks like some things mounted via default runc runtime here, but idk
sudo cat /etc/containerd/config.toml  | grep bin
# OUTPUT
#  bin_dir = "/home/kubernetes/bin"

# ls /home/kubernetes/bin/nvidia/bin/
#nvidia-bug-report.sh     nvidia-debugdump  nvidia-ngx-updater   nvidia-sleep.sh   nvidia-xconfig
#nvidia-cuda-mps-control  nvidia-installer  nvidia-persistenced  nvidia-smi
#nvidia-cuda-mps-server   nvidia-modprobe   nvidia-settings      nvidia-uninstall

# check nvidia containers running
crictl ps | grep nvidia-gpu

# OUTPUT
25eec6551f9e5       2f78042af231d       7 hours ago         Running             nvidia-gpu-device-plugin   0                   ca9dd0d8e2822       nvidia-gpu-device-plugin-small-cos-674fk

Cool! Some things are what I assumed and others I have some more digging to do!

Scope#

Overview of components#

How does Kubernetes know which pods need GPU(s)?#

How does Kubernetes know which nodes have GPU(s)?#

How does the nvidia runtimeClass expose GPU(s)?#

How does Ubuntu know it has a GPU?#

How does the motherboard know a GPU is connected?#

Configuring a NVIDIA RTX GPU on Kubernetes#

Validate hardware connection#

NVIDIA GPU driver considerations#

Finding the correct driver version#

(a side note about CUDA versions)#

Install the NVIDIA GPU driver#

Validate the NVIDIA GPU driver#

Are the kernel modules installed? Is the driver working?#

Install NVIDIA Container Toolkit#

Configure containerd#

Validate containerd#

Install the NVIDIA GPU Operator using helm#

Validate the GPU Operator#

Putting it all together#

Conclusion#

Clean up#

Bonus: Lazy GKE A100 exploration#

Scope

Overview of components

How does Kubernetes know which pods need GPU(s)?

How does Kubernetes know which nodes have GPU(s)?

How does the `nvidia` runtimeClass expose GPU(s)?

How does Ubuntu know it has a GPU?

How does the motherboard know a GPU is connected?

Configuring a NVIDIA RTX GPU on Kubernetes

Validate hardware connection

NVIDIA GPU driver considerations

Finding the correct driver version

(a side note about CUDA versions)

Install the NVIDIA GPU driver

Validate the NVIDIA GPU driver

Are the kernel modules installed? Is the driver working?

Install NVIDIA Container Toolkit

Configure `containerd`

Validate `containerd`

Install the NVIDIA GPU Operator using `helm`

Validate the GPU Operator

Putting it all together

Conclusion

Clean up

Bonus: Lazy GKE A100 exploration