diff --git a/confidential-containers/confidential-containers-deploy.rst b/confidential-containers/confidential-containers-deploy.rst
index 872013e1c..5ad4ba6c6 100644
--- a/confidential-containers/confidential-containers-deploy.rst
+++ b/confidential-containers/confidential-containers-deploy.rst
@@ -175,7 +175,7 @@ Kubernetes Cluster
&& ./get_helm.sh
-* Enable the ``KubeletPodResourcesGet`` and ``RuntimeClassInImageCriApi`` Kubelet feature gates on your cluster.
+* Add the ``KubeletPodResourcesGet`` and ``RuntimeClassInImageCriApi`` feature gates to the Kubelet configuration file on each worker node (typically ``/var/lib/kubelet/config.yaml``).
On Kubernetes v1.34 and later, ``KubeletPodResourcesGet`` is already enabled by default and only ``RuntimeClassInImageCriApi`` requires explicit configuration.
On earlier Kubernetes versions, enable both gates.
@@ -184,7 +184,14 @@ Kubernetes Cluster
* ``RuntimeClassInImageCriApi``: Alpha since Kubernetes v1.29 and not enabled by default.
Required to support pod deployments that use multiple snapshotters side-by-side.
- Add both feature gates to your Kubelet configuration (typically ``/var/lib/kubelet/config.yaml``):
+ Add the feature gates to your Kubelet configuration.
+ For example, on the worker node:
+
+ .. code-block:: console
+
+ $ sudo nano /var/lib/kubelet/config.yaml
+
+ Add the following to the file:
.. code-block:: yaml
@@ -211,7 +218,14 @@ Kubernetes Cluster
Set ``runtimeRequestTimeout`` in your `kubelet configuration `_ to ``20m`` to align with the default Kata shim ``image_pull_timeout`` of 1200 seconds.
The kubelet default is 2 minutes, which can be too short for GPU workloads.
- Add or update the ``runtimeRequestTimeout`` field in your kubelet configuration (typically ``/var/lib/kubelet/config.yaml``):
+ Add or update the ``runtimeRequestTimeout`` field in the same Kubelet configuration file (``/var/lib/kubelet/config.yaml``).
+ For example:
+
+ .. code-block:: console
+
+ $ sudo nano /var/lib/kubelet/config.yaml
+
+ Add or update the following in the file:
.. code-block:: yaml
:emphasize-lines: 3
@@ -235,6 +249,10 @@ Kubernetes Cluster
Installation
============
+This section installs Kata Containers and the NVIDIA GPU Operator on your existing Kubernetes cluster.
+The worker nodes listed by ``kubectl get nodes`` are the nodes already registered with your cluster when it was provisioned.
+This guide does not create new nodes.
+
.. _coco-label-nodes:
Label Nodes
@@ -270,16 +288,10 @@ Label Nodes
$ kubectl label node $NODE_NAME nvidia.com/gpu.workload.config=vm-passthrough
- The GPU Operator uses this label to determine what software components to deploy to a node.
- The ``nvidia.com/gpu.workload.config=vm-passthrough`` label specifies that the node should receive the software components to run Confidential Containers.
-
- A node can only run one container runtime at a time, so a labeled node runs only Confidential Container workloads and cannot run traditional GPU container workloads.
- The labeling approach is useful if you want to run Confidential Containers workloads on some nodes and traditional GPU container workloads on other nodes in your cluster.
- For more details on how the GPU Operator deploys components to your cluster, refer to the :ref:`GPU Operator Cluster Topology Considerations ` section in the architecture overview.
-
- .. tip::
+ .. note::
- Skip this section if you plan to use all nodes in your cluster to run Confidential Containers and instead set ``sandboxWorkloads.defaultWorkload=vm-passthrough`` when installing the GPU Operator.
+ If the label is already present, ``kubectl label`` may print ``not labeled`` even though the label is set correctly.
+ Use the verification step below to confirm the label value.
#. Verify the node label was added:
@@ -295,6 +307,18 @@ Label Nodes
After labeling the node, you can continue to the next steps to install Kata Containers and the NVIDIA GPU Operator.
+The GPU Operator uses this label to determine what software components to deploy to a node.
+The ``nvidia.com/gpu.workload.config=vm-passthrough`` label specifies that the node should receive the software components to run Confidential Containers.
+
+A node can only run one container runtime at a time, so a labeled node runs only Confidential Container workloads and cannot run traditional GPU container workloads.
+The labeling approach is useful if you want to run Confidential Containers workloads on some nodes and traditional GPU container workloads on other nodes in your cluster.
+For more details on how the GPU Operator deploys components to your cluster, refer to the :ref:`GPU Operator Cluster Topology Considerations ` section in the architecture overview.
+
+.. tip::
+
+ Skip this section if you plan to use all nodes in your cluster to run Confidential Containers and instead set ``sandboxWorkloads.defaultWorkload=vm-passthrough`` when installing the GPU Operator.
+
+
.. _coco-install-kata-chart:
Install the Kata Containers Helm Chart
@@ -349,20 +373,31 @@ The minimum required version is 3.29.0.
The GPU Operator will deploy and manage NFD in the next step.
-#. Optional: Verify that the ``kata-deploy`` pod is running:
+#. Verify that the ``kata-deploy`` pod is running:
.. code-block:: console
$ kubectl get pods -n kata-system | grep kata-deploy
- *Example Output:*
+ *Example Output (install in progress):*
+
+ .. code-block:: output
+
+ NAME READY STATUS RESTARTS AGE
+ kata-deploy-b2lzs 0/1 Running 0 2m15s
+
+ *Example Output (install complete):*
.. code-block:: output
NAME READY STATUS RESTARTS AGE
kata-deploy-b2lzs 1/1 Running 0 6m37s
-#. Optional: Verify that the ``kata-qemu-nvidia-gpu``, ``kata-qemu-nvidia-gpu-snp``, and ``kata-qemu-nvidia-gpu-tdx`` runtime classes are available:
+ The ``kata-deploy`` chart deploys a DaemonSet that installs Kata on each node and then keeps running.
+ Wait until the pod is ``Running`` and ``READY`` is ``1/1`` to confirm the Kata install finished.
+
+#. Verify that the ``kata-qemu-nvidia-gpu``, ``kata-qemu-nvidia-gpu-snp``, and ``kata-qemu-nvidia-gpu-tdx`` runtime classes are available.
+ These runtime classes are required to schedule confidential container workloads.
.. code-block:: console
@@ -377,11 +412,17 @@ The minimum required version is 3.29.0.
kata-qemu-nvidia-gpu-snp kata-qemu-nvidia-gpu-snp 40s
kata-qemu-nvidia-gpu-tdx kata-qemu-nvidia-gpu-tdx 40s
+ If only some runtime classes are listed, the chart is still deploying.
+ Wait 2–5 minutes and run the command again.
+ Missing runtime classes after 5 minutes indicate a Kata installation problem.
+ Check the ``kata-deploy`` pod logs for more details.
+ Before :ref:`running a sample workload `, all three runtime classes must be present.
+
Several runtimes are installed by the ``kata-deploy`` chart.
The ``kata-qemu-nvidia-gpu`` runtime class is used with Kata Containers, in a non-Confidential Containers scenario.
The ``kata-qemu-nvidia-gpu-snp`` for AMD-based systems or ``kata-qemu-nvidia-gpu-tdx`` for Intel-based systems runtime classes are used to deploy Confidential Containers workloads.
-#. Optional: If you have an issue deploying the ``kata-deploy`` pod or are not seeing the expected runtime classes, get the pod name and view the logs:
+#. Optional: If the ``kata-deploy`` pod is not ``Running`` and ready, or runtime classes are missing, get the pod name and view the logs:
.. code-block:: console
@@ -390,6 +431,18 @@ The minimum required version is 3.29.0.
Replace ```` with the name of the ``kata-deploy`` pod from the first command's output.
+ *Example Output (successful install):*
+
+ .. code-block:: output
+
+ ...
+ Install completed
+ daemonset mode: waiting for SIGTERM
+
+ If logs indicate a problem, recheck the :ref:`Prerequisites `.
+ Search the `Kata Containers issue `_ for similar reports.
+ If the issue persists, file a new issue there with ``kata-deploy`` pod logs and your environment details.
+
.. _coco-install-gpu-operator:
Install the NVIDIA GPU Operator
@@ -452,12 +505,18 @@ Install the NVIDIA GPU Operator and configure it to deploy Confidential Containe
Refer to the :ref:`Common chart customization options ` in :doc:`Installing the NVIDIA GPU Operator ` for more details on the additional general configuration options you can specify when installing the GPU Operator.
-#. Optional: Verify that all GPU Operator pods, especially the Confidential Computing Manager, Kata Device Plugin and VFIO Manager operands, are running:
+#. Verify that all GPU Operator pods, especially the Confidential Computing Manager, Kata Device Plugin and VFIO Manager operands, are running:
.. code-block:: console
$ kubectl get pods -n gpu-operator
+ .. note::
+
+ The first time you run this command, you might see only a subset of pods while operands are still starting.
+ GPU Operator pods can take 3–5 minutes after the Helm command completes to all reach the Running state.
+ Re-run the command until the Confidential Computing Manager, Kata Sandbox Device Plugin, and VFIO Manager pods are Running.
+
*Example Output:*
.. code-block:: output
@@ -483,6 +542,27 @@ Install the NVIDIA GPU Operator and configure it to deploy Confidential Containe
$ kubectl logs -n gpu-operator
Replace ```` with the name of the GPU Operator pod from ``kubectl get pods -n gpu-operator``.
+ Also refer to the :doc:`NVIDIA GPU Operator troubleshooting guide ` for more details on resolving common issues.
+
+#. Verify that the node advertises GPU passthrough capacity:
+
+ .. code-block:: console
+
+ $ kubectl describe node $NODE_NAME | grep -E "nvidia.com/pgpu"
+
+ *Example Output:*
+
+ .. code-block:: output
+
+ nvidia.com/pgpu: 1
+
+ If capacity is ``0``, a startup race may have occurred between the VFIO Manager and Kata Sandbox Device Plugin.
+ Restart the device plugin and wait for the rollout to finish:
+
+ .. code-block:: console
+
+ $ kubectl rollout restart daemonset/nvidia-kata-sandbox-device-plugin-daemonset -n gpu-operator
+ $ kubectl rollout status daemonset/nvidia-kata-sandbox-device-plugin-daemonset -n gpu-operator
#. Optional: If you have host access to the worker node, you can perform the following validation step:
@@ -501,10 +581,6 @@ Install the NVIDIA GPU Operator and configure it to deploy Confidential Containe
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau
- .. tip::
-
- If you have an issue deploying the GPU Operator, refer to the :doc:`NVIDIA GPU Operator troubleshooting guide ` for guidance on troubleshooting and resolving issues.
-
With Kata Containers and the GPU Operator installed, you can start using your cluster to run Confidential Containers workloads.
To run a sample workload, refer to the :ref:`Run a Sample Workload ` section.
@@ -519,6 +595,8 @@ For further configuration settings, refer to the following sections:
Run a Sample Workload
=====================
+Before running a workload, confirm that Kata Containers and the GPU Operator are both installed and that the ``kata-qemu-nvidia-gpu-snp`` and ``kata-qemu-nvidia-gpu-tdx`` runtime classes are available on the cluster.
+
A pod manifest for a confidential container GPU workload requires that you specify the ``kata-qemu-nvidia-gpu-snp`` runtime class for AMD-based systems or ``kata-qemu-nvidia-gpu-tdx`` for Intel-based systems.
1. Create a file, such as the following ``cuda-vectoradd-kata.yaml`` sample, specifying the appropriate runtime class for your system:
@@ -618,6 +696,26 @@ A pod manifest for a confidential container GPU workload requires that you speci
NAME READY STATUS RESTARTS AGE
cuda-vectoradd-kata 1/1 Running 0 10s
+ .. note::
+
+ ``kubectl apply`` returns immediately and produces no further output while the pod is starting.
+ If the pod does not reach ``Running``, describe it to check events:
+
+ .. code-block:: console
+
+ $ kubectl describe pod cuda-vectoradd-kata
+
+ If you see ``FailedCreatePodSandBox`` with ``GetPodResources failed`` or ``PodResources API Get method disabled``, the ``KubeletPodResourcesGet`` feature gate is not enabled on the worker node.
+ The event looks similar to the following:
+
+ .. code-block:: output
+
+ Warning FailedCreatePodSandBox ... kubelet Failed to create pod sandbox: ... device cold plug failed: ... GetPodResources failed for pod(cuda-vectoradd-kata) in namespace(default): ... PodResources API Get method disabled
+
+ Refer to the Kubernetes Prerequisites section for details on how to enable the feature gate.
+
+ If the pod stays in ``Pending`` with ``Insufficient nvidia.com/pgpu``, confirm the node is labeled with ``nvidia.com/gpu.workload.config=vm-passthrough`` and that ``nvidia.com/pgpu`` capacity is greater than zero.
+
4. View the logs from the pod after the container starts:
.. code-block:: console
@@ -749,7 +847,6 @@ You can set this option when you install NVIDIA GPU Operator or afterward by mod
When you change the mode, the manager performs the following actions:
* Evicts the other GPU Operator operands from the node.
-
However, the manager does not drain user workloads. You must make sure that no user workloads are running on the node before you change the mode.
* Changes the mode and resets the GPU.
@@ -835,6 +932,14 @@ To verify that a mode change was successful, view the ``nvidia.com/cc.mode``,
$ kubectl get node $NODE_NAME -o json | \
jq '.metadata.labels | with_entries(select(.key | startswith("nvidia.com/cc")))'
+To disable Confidential Computing on a node and verify the transition:
+
+.. code-block:: console
+
+ $ kubectl label node $NODE_NAME nvidia.com/cc.mode=off --overwrite
+
+Wait 1-2 minutes and recheck the labels.
+
*Example Output (CC mode disabled):*
.. code-block:: json
@@ -864,11 +969,15 @@ To verify that a mode change was successful, view the ``nvidia.com/cc.mode``,
* The ``nvidia.com/cc.ready.state`` label indicates whether the node is ready to run Confidential Container workloads.
It is set to ``true`` when ``cc.mode.state`` is ``on`` or ``ppcie``, and ``false`` when ``cc.mode.state`` is ``off``.
+ When you disable CC mode, expect ``nvidia.com/cc.ready.state`` to become ``false`` once the transition completes.
+ While a transition is in progress, ``nvidia.com/cc.mode.state`` may temporarily differ from ``nvidia.com/cc.mode``.
+
.. note::
It can take one to two minutes for GPU state transitions to complete and the labels to be updated.
A mode change is complete and successful when ``nvidia.com/cc.mode`` and
``nvidia.com/cc.mode.state`` have the same value.
+ If labels do not converge, check Confidential Computing Manager pod logs, ensure no user workloads are running on the node, and reapply the ``nvidia.com/cc.mode`` label.
.. _coco-configuration-multi-gpu-passthrough: