A Docker-based rancher/k3s node image on an Ubuntu base with the NVIDIA container toolkit baked in, so a k3d cluster can schedule your host’s NVIDIA GPU(s) — GPUs are exposed on up with no kubectl apply. Built for Ubuntu 26.04 (default) and 24.04.
- Quick Start (k3d-gpu CLI)
- Features
- Prerequisites
- Environment Variables
- Building & Pushing the Image
- k3d Cluster Setup
- Testing GPU Access
- References
- Contributing
- Release History
- License
The Arch package installs a k3d-gpu launcher that wraps the whole workflow — no
need to remember k3d cluster create flags:
yay -S k3d-gpu # or build from packaging/aur/PKGBUILD
k3d-gpu doctor # preflight: GPU, docker, nvidia runtime, k3d, kubectl
k3d-gpu up # create the cluster (device plugin auto-deploys), verify GPUs>0
k3d-gpu test # run a CUDA pod and print nvidia-smi
k3d-gpu logs # tail the k3s server container logs
k3d-gpu down # delete the clusterBehaviour is tunable via environment variables:
| Variable | Default | Description |
|---|---|---|
K3D_GPU_CLUSTER |
gpu |
cluster name |
K3D_GPU_IMAGE |
cryptoandcoffee/k3d-gpu:latest |
node image (latest = ubuntu26.04) |
K3D_GPU_PLUGIN |
/usr/share/k3d-gpu/nvidia-device-plugin.yml |
fallback manifest (only used if a custom image lacks the baked one) |
K3D_GPU_TEST_IMAGE |
nvidia/cuda:13.1.2-base-ubuntu24.04 |
image used by k3d-gpu test |
The rest of this README documents the underlying image and the manual k3d
commands the launcher runs for you.
- K3s + NVIDIA Container Toolkit on an Ubuntu base — 26.04 (default) and 24.04
- NVIDIA device plugin baked into k3s auto-deploy —
upexposes GPUs with nokubectl apply - Pre‑configured nvidia containerd runtime;
--default-runtime=nvidiafor zero-config GPU pods - No CUDA toolkit in the node image — driver libs are injected from the host, workloads bring their own CUDA
- Exposes the standard K3s entrypoint (
/bin/k3s agent); volumes for kubelet, k3s state, CNI, logs - Tunable via build arguments for the K3s and Ubuntu versions
- Docker (20.10+), configured with NVIDIA GPU support (i.e.,
nvidia-docker2or Docker’s built‑in--gpus) - k3d (v5.0.0 or later) to manage local K3s clusters
- A host NVIDIA GPU with an up‑to‑date driver (the node image needs no CUDA toolkit)
These are Docker build args (not runtime env):
| Build arg | Default | Description |
|---|---|---|
K3S_TAG |
v1.34.1-k3s1-amd64 |
K3s image tag from rancher/k3s (auto-bumped) |
UBUNTU_TAG |
26.04 |
Ubuntu base tag (26.04 default; 24.04 also published) |
Build a specific Ubuntu base:
docker build \
--build-arg UBUNTU_TAG="24.04" \
-t cryptoandcoffee/k3d-gpu:ubuntu24.04 .Clone this repository and build with the included build.sh or manually:
git clone https://github.com/88plug/k3d-gpu.git
cd k3d-gpu
# Using build.sh
./build.sh
# Or manually
docker build --platform linux/amd64 \
-t cryptoandcoffee/k3d-gpu .
# Push to Docker Hub (or your registry)
docker push cryptoandcoffee/k3d-gpuCreate a k3d cluster that uses the GPU‑enabled image and passes all host GPUs into each node container:
k3d cluster create gpu-cluster \
--image cryptoandcoffee/k3d-gpu \
--servers 1 --agents 1 \
--gpus all \
--port 6443:6443@loadbalancer \
--k3s-arg "--default-runtime=nvidia@server:*" \
--k3s-arg "--default-runtime=nvidia@agent:*"Note: The
--gpus allflag exposes every host GPU to the node containers.
--default-runtime=nvidiais required. k3s auto-detects the nvidia containerd runtime but still leavesruncas the default, so pods start without the GPU driver libraries — the device plugin then fails withFailed to initialize NVML: ERROR_LIBRARY_NOT_FOUNDand the cluster advertises zero GPUs even thoughdocker exec … nvidia-smiworks on the node. This flag makes nvidia the default runtime on every node. Thek3d-gpulauncher sets it for you. If you cannot change the default runtime, setruntimeClassName: nvidiaon each GPU pod instead (the bundled device-plugin manifest already does).
For optimal performance, you may need to increase inotify limits on your host system (not in containers):
# Temporarily (until reboot):
sudo sysctl -w fs.inotify.max_user_watches=100000
sudo sysctl -w fs.inotify.max_user_instances=100000
# Permanently (survives reboots):
echo "fs.inotify.max_user_watches=100000" | sudo tee -a /etc/sysctl.conf
echo "fs.inotify.max_user_instances=100000" | sudo tee -a /etc/sysctl.conf
sudo sysctl -pThe device plugin is baked into the image at
/var/lib/rancher/k3s/server/manifests/, so k3s auto-deploys it on startup —
nothing to install. The bundled manifest sets runtimeClassName: nvidia, so GPUs
are advertised even without changing the node default runtime.
Only if you run a custom base image that doesn't ship it, apply the upstream manifest yourself:
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/main/deployments/static/nvidia-device-plugin.ymlVerify GPU visibility:
k3d-gpu test # runs a CUDA pod (runtimeClassName: nvidia) and prints nvidia-smi
# or check the scheduler directly — must be > 0:
kubectl get nodes -o jsonpath='{.items[*].status.allocatable.nvidia\.com/gpu}{"\n"}'k3d-gpu up already asserts nvidia.com/gpu > 0 and fails loudly if not, so a
clean up means GPUs are schedulable.
Note: If
nvidia-smireportsFailed to initialize NVMLor a non-zeroresult=whiledocker exec … nvidia-smion the node works, the test pod's CUDA image is newer than the host driver. PointK3D_GPU_TEST_IMAGEat a tag your driver supports — see the CUDA/driver compatibility matrix.
Contributions, issues, and feature requests are welcome! Please fork the repository and submit a pull request.
| Date | K3s Tag | Device Plugin |
|---|---|---|
| 2026-06-04 | v1.34.1-k3s1-amd64 | v0.19.2 |
| 2026-06-03 | v1.34.1-k3s1-amd64 | v0.19.2 |
| 2026-06-03 | v1.34.1-k3s1-amd64 | v0.19.2 |
| 2026-06-02 | v1.34.1-k3s1-amd64 | v0.19.2 |
FSL-1.1-ALv2 © 2025 Crypto & Coffee Development Team