Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 28 additions & 34 deletions confidential-containers/confidential-containers-deploy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -142,21 +142,6 @@ Kubernetes Cluster
* A Kubernetes cluster with cluster administrator privileges.
Refer to the :ref:`Supported Software Components <coco-supported-software-components>` table for supported Kubernetes versions.

* containerd version 2.2.2 installed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for me to understand - why are we removing this here? Maybe I don't have the complete oversight right now, and this is listed somewhere else?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we list it in the platform support page. We added it b/c there was a hard requirement for 2.2.2 being installed. In this release we now support 2.3.

Do you still think it is valuable to call this out to folks here as well as in the support matrix?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be a bit confusing. yes, we list containerd in the support matrix but we also list QEMU. While QEMU comes from kata-deploy, containerd doesn't ... so users still need to ensure they have containerd running. Maybe we can collapse this into the prior point about 'A Kubernetes cluster with cluster administrator privileges" => to something like:
"A Kubernetes cluster with cluster administrator privileges using containerd on the nodes", "Refer to the ... table for supported Kubernetes and containerd versions"?

@fidencio thoughts?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep the note about containerd as we know from past experiences that people tend to just ignore when it's not that explicit.

Refer to the `containerd Getting Started guide <https://containerd.io/docs/2.2/getting-started/>`_ for installation instructions.

To verify the installed version, run the following command:

.. code-block:: console

$ containerd --version

*Example Output:*

.. code-block:: output

containerd containerd.io 2.2.2 ...

* Helm installed.
Use the command below to install Helm or refer to the `Helm documentation <https://helm.sh/docs/intro/install/>`_ for installation instructions.

Expand Down Expand Up @@ -294,24 +279,27 @@ Install the Kata Containers Helm Chart
Install Kata Containers using the ``kata-deploy`` Helm chart.
The ``kata-deploy`` chart installs all required components from the Kata Containers project including the Kata Containers runtime binary, runtime configuration, UVM kernel, and images that NVIDIA uses for Confidential Containers and native Kata containers.

The minimum required version is 3.29.0.
The minimum required version is ${kata_version}.

#. Set the chart version and registry path:

.. code-block:: console

$ export VERSION="3.29.0"
$ export VERSION="${kata_version}"
$ export CHART="oci://ghcr.io/kata-containers/kata-deploy-charts/kata-deploy"

#. Create a values file, such as ``kata-nvidia-gpu-values.yaml``, to configure the ``kata-deploy`` chart for NVIDIA Confidential Containers:

#. Install the kata-deploy Helm chart:
.. literalinclude:: ./samples/kata-nvidia-gpu-values.yaml
:language: yaml

#. Install the kata-deploy Helm chart with the values file:

.. code-block:: console

$ helm install kata-deploy "${CHART}" \
--namespace kata-system --create-namespace \
--set nfd.enabled=false \
--wait --timeout 10m \
-f kata-nvidia-gpu-values.yaml \
--version "${VERSION}"

*Example Output:*
Expand All @@ -327,31 +315,37 @@ The minimum required version is 3.29.0.

.. note::

The ``--wait`` flag in the install command instructs Helm to wait until the release is deployed before returning.
It can take a 2-3 minutes to return output.
The Helm install command returns as soon as the Kubernetes resources are created.
The ``kata-deploy`` DaemonSet then takes several minutes per node to extract artifacts, restart containerd, and label the node before its pods report ready.
You can use either of the optional verification steps below to confirm readiness before continuing.

There is a `known Helm issue <https://github.com/helm/helm/issues/8660>`_ on single node clusters, that may result in the Helm command finishing before all deployed pods are finished initializing.
If you are deploying to a single node cluster, you may need to wait for an additional few minutes after the Helm command completes for the ``kata-deploy`` pod to be in the Running state.

.. note::
#. Optional: Verify that the ``kata-deploy`` DaemonSet has finished rolling out on every node:

.. code-block:: console

$ kubectl -n kata-system rollout status ds/kata-deploy --timeout=20m

Both ``kata-deploy`` and the GPU Operator deploy Node Feature Discovery (NFD) by default.
The install command includes ``--set nfd.enabled=false`` to prevent ``kata-deploy`` from deploying NFD.
The GPU Operator will deploy and manage NFD in the next step.
*Example Output:*

.. code-block:: output

Waiting for daemon set "kata-deploy" rollout to finish: 0 of 1 updated pods are available...
daemon set "kata-deploy" successfully rolled out

#. Optional: Verify that the ``kata-deploy`` pod is running:

#. Optional: Verify that the ``kata-deploy`` pods are running:

.. code-block:: console

$ kubectl get pods -n kata-system | grep kata-deploy
$ kubectl get pods -n kata-system

*Example Output:*

.. code-block:: output

NAME READY STATUS RESTARTS AGE
kata-deploy-b2lzs 1/1 Running 0 6m37s
NAME READY STATUS RESTARTS AGE
kata-deploy-b2lzs 1/1 Running 0 6m37s

#. Optional: Verify that the ``kata-qemu-nvidia-gpu``, ``kata-qemu-nvidia-gpu-snp``, and ``kata-qemu-nvidia-gpu-tdx`` runtime classes are available:

Expand Down Expand Up @@ -415,7 +409,7 @@ Install the NVIDIA GPU Operator and configure it to deploy Confidential Containe
--set sandboxWorkloads.mode=kata \
--set nfd.enabled=true \
--set nfd.nodefeaturerules=true \
--version=v26.3.1
--version=${gpu_operator_version}


*Example Output:*
Expand Down Expand Up @@ -701,7 +695,7 @@ The following example installs the GPU Operator with both ``P_GPU_ALIAS`` and ``
--set kataSandboxDevicePlugin.env[0].value="" \
--set kataSandboxDevicePlugin.env[1].name=NVSWITCH_ALIAS \
--set kataSandboxDevicePlugin.env[1].value="" \
--version=v26.3.1
--version=${gpu_operator_version}

After installing the GPU Operator, you can view the GPU or NVSwitch resource types available on a node by running the following command:

Expand Down
34 changes: 34 additions & 0 deletions confidential-containers/release-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,40 @@ This document describes the new features and known issues for the NVIDIA Confide

----

.. _coco-v1.1.0:

1.1.0
=====

This release expands hardware coverage and updates the validated software stack.

New Features
------------

* Added support for the NVIDIA HGX B300 platform with both single-GPU and multi-GPU passthrough.

* Added support for Ubuntu 26.04 as a host operating system.

* Added support for the following software components:

* Kata Containers 3.31.0
* containerd 2.3.x


Docs Changelog
--------------

The :ref:`coco-install-kata-chart` procedure was updated for this release.
Changes include:

* Installs ``kata-deploy`` with a values file instead of inline ``--set`` flags.

* Includes a new sample values file, :file:`samples/kata-nvidia-gpu-values.yaml`, that configures the ``kata-deploy`` Helm chart for the NVIDIA Confidential Containers reference architecture (NVIDIA GPU shims only, NFD disabled, ``nydus`` snapshotter, and per-shim runtime class node selectors).

* Adds a readiness verification step using ``kubectl rollout status ds/kata-deploy``. This step relies on the readiness reporting added in Kata Containers 3.31.0 and lets you confirm that ``kata-deploy`` has finished extracting artifacts and restarting containerd on every node before continuing.

----

.. _coco-v1.0.0:

1.0.0
Expand Down
107 changes: 107 additions & 0 deletions confidential-containers/samples/kata-nvidia-gpu-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Example values file to enable NVIDIA GPU shims for the NVIDIA
# Confidential Containers Reference Architecture.

# Set to true for verbose kata-deploy and Kata runtime logging.
debug: false

# Disable Node Feature Discovery (NFD) deployment by kata-deploy.
# Both kata-deploy and the GPU Operator deploy NFD by default. This
# reference architecture relies on the NFD instance that the GPU Operator
# deploys and manages, so the kata-deploy NFD is turned off to avoid a
# duplicate, conflicting deployment.
nfd:
enabled: false

# Install the nydus snapshotter on each node alongside containerd.
# The confidential -snp and -tdx shims below use nydus to pull container
# images directly into the confidential VM (guest pull), which keeps image
# contents inside the trusted execution environment (TEE).
snapshotter:
setup: ["nydus"]

# Disable the chart's default hypervisor/TEE shims and opt in only to
# the NVIDIA GPU shims supported by this reference architecture.
shims:
disableAll: true

# Non-confidential NVIDIA GPU passthrough shim used when Confidential
# Computing mode is off on the node. The runtime class is restricted to
# nodes where the GPU Operator's Confidential Computing Manager has
# reported nvidia.com/cc.ready.state=false, so it will not schedule on
# CC-ready nodes. The empty containerd snapshotter falls back to the
# default (overlayfs); guest pull is not used for this non-confidential
# path.
qemu-nvidia-gpu:
enabled: true
supportedArches:
- amd64
allowedHypervisorAnnotations: []
containerd:
snapshotter: ""
runtimeClass:
# This label is automatically added by the GPU Operator.
nodeSelector:
nvidia.com/cc.ready.state: "false"

# Confidential NVIDIA GPU passthrough for AMD SEV-SNP nodes.
# Scheduled where the GPU Operator reports CC mode is on AND NFD
# reports SEV-SNP support. Set agent.httpsProxy / agent.noProxy if
# the guest needs a proxy to reach the registry.
qemu-nvidia-gpu-snp:
enabled: true
supportedArches:
- amd64
allowedHypervisorAnnotations: []
containerd:
snapshotter: "nydus"
forceGuestPull: false
crio:
guestPull: true
agent:
httpsProxy: ""
noProxy: ""
runtimeClass:
# These labels are automatically added by the GPU Operator and NFD
# respectively.
nodeSelector:
nvidia.com/cc.ready.state: "true"
amd.feature.node.kubernetes.io/snp: "true"

# Confidential NVIDIA GPU passthrough for Intel TDX nodes.
# Same selectors and snapshotter behavior as the SNP shim above,
# but pinned to TDX-capable hosts.
qemu-nvidia-gpu-tdx:
enabled: true
supportedArches:
- amd64
allowedHypervisorAnnotations: []
containerd:
snapshotter: "nydus"
forceGuestPull: false
crio:
guestPull: true
agent:
httpsProxy: ""
noProxy: ""
runtimeClass:
# These labels are automatically added by the GPU Operator and NFD
# respectively.
nodeSelector:
nvidia.com/cc.ready.state: "true"
intel.feature.node.kubernetes.io/tdx: "true"

# Default shim when a pod does not request a runtime class. Set to the
# non-confidential shim so pods only run in a confidential VM when
# they explicitly request the -snp or -tdx runtime class.
defaultShim:
amd64: qemu-nvidia-gpu # Can be changed to qemu-nvidia-gpu-snp or qemu-nvidia-gpu-tdx if preferred

# Create one Kubernetes RuntimeClass per enabled shim above
# (kata-qemu-nvidia-gpu, kata-qemu-nvidia-gpu-snp, kata-qemu-nvidia-gpu-tdx).
# createDefault: false suppresses the generic "kata" RuntimeClass since
# you should always reference a specific NVIDIA shim
# by name in pod specs.
runtimeClasses:
enabled: true
createDefault: false
defaultName: "kata"
17 changes: 10 additions & 7 deletions confidential-containers/supported-platforms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,9 @@ NVIDIA GPUs
* - NVIDIA B200
- Single-GPU, Multi-GPU

* - NVIDIA HGX B300
- Single-GPU, Multi-GPU

* - NVIDIA RTX Pro 6000 BSE
- Single-GPU

Expand All @@ -75,10 +78,10 @@ CPU Platforms
- Operating System
- Kernel Version
* - AMD Genoa / Milan
- Ubuntu 25.10
- Ubuntu 25.10 or 26.04
- 6.17+
* - Intel Emerald Rapids (ER) / Granite Rapids (GR)
- Ubuntu 25.10
- Ubuntu 25.10 or 26.04
- 6.17+

For additional information on node configuration, refer to the `Confidential Computing Deployment Guide <https://docs.nvidia.com/cc-deployment-guide-tdx-snp.pdf>`_ for information about supported NVIDIA GPUs, such as the NVIDIA Hopper H100.
Expand All @@ -88,7 +91,7 @@ The following topics in the deployment guide apply to a cloud-native environment
* Hardware selection and initial hardware configuration, such as BIOS settings.
* Host operating system selection, initial configuration, and validation.

When following the cloud-native sections in the deployment guide linked above, use Ubuntu 25.10 as the host OS with its default kernel version and configuration.
When following the cloud-native sections in the deployment guide linked above, use Ubuntu 25.10 or 26.04 as the host OS with its default kernel version and configuration.

For additional resources on machine setup:

Expand All @@ -114,15 +117,15 @@ Supported Software Components
* - `QEMU <https://www.qemu.org/>`__
- 10.1 \+ Patches
* - `Containerd <https://github.com/containerd/containerd>`__
- 2.2.2
- 2.2.2 or 2.3.x

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we do a 2.2.2+ or 2.3.x ?

* - `Kubernetes <https://kubernetes.io/>`__
- 1.32 \+
* - `NVIDIA GPU Operator <https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html>`__ and its components.

Refer to the :ref:`GPU Operator Component Matrix <gpuop:operator-component-matrix>` for the list of components and versions included in each release.
- v26.3.1 and higher
- ${gpu_operator_version} and higher
* - `Kata Containers <https://katacontainers.io/>`__
- 3.29 (installed with ``kata-deploy`` Helm chart)
- ${kata_version} (installed with ``kata-deploy`` Helm chart)
* - `Key Broker Service (KBS) protocol <https://confidentialcontainers.org/docs/attestation/>`__
- 0.4.0
* - `Kata Lifecycle Manager <https://github.com/kata-containers/lifecycle-manager>`__
Expand Down
Loading
Loading