Skip to content

Init pass at restructuring CoCo TOC#385

Open
a-mccarthy wants to merge 3 commits into
NVIDIA:mainfrom
a-mccarthy:coco-strucutre
Open

Init pass at restructuring CoCo TOC#385
a-mccarthy wants to merge 3 commits into
NVIDIA:mainfrom
a-mccarthy:coco-strucutre

Conversation

@a-mccarthy

@a-mccarthy a-mccarthy commented Apr 30, 2026

Copy link
Copy Markdown
Collaborator

The deployment guide has grown quite long. this is a draft attempt at splitting up the content into a more useable form.

@a-mccarthy a-mccarthy marked this pull request as draft April 30, 2026 15:27
@github-actions

Copy link
Copy Markdown

Documentation preview

https://nvidia.github.io/cloud-native-docs/review/pr-385

@a-mccarthy a-mccarthy changed the title Init pass at restructuring TOC Init pass at restructuring CoCo TOC Apr 30, 2026
@a-mccarthy a-mccarthy marked this pull request as ready for review May 12, 2026 15:41

resources:
limits:
nvidia.com/GH100_H200_141GB: "1"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirm this is a valid gpu name on a node

If you need a timeout of more than 1200 seconds, you will also need to adjust Kata Agent Policy's ``image_pull_timeout`` value which controls the agent-side timeout for guest-image pull.
To do this, add the ``agent.image_pull_timeout`` kernel parameter to your shim configuration, or pass an explicit value in a pod annotation in the ``io.katacontainers.config.hypervisor.kernel_params: "..."`` annotation.

"nvidia.com/GH100_H200_141GB": "1"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirm this output.

@a-mccarthy a-mccarthy requested a review from manuelh-dev May 18, 2026 20:23
Comment thread confidential-containers/attestation.rst Outdated
Comment thread confidential-containers/attestation.rst Outdated

*****************************************************
#####################################################
NVIDIA Confidential Containers Reference Architecture

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking of whether it is possible to make the aspects
"Supported Features and Deployment Scenarios" and "Limitations and Restrictions" a bit more prominent. These get a bit buried in the already lengthy overview page. Maybe we can relocate these two into a different main page (or even create a separate main page)?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favor of doing this, but not in this PR. I have to circle back with hema, b/c i think that we can flush out at lot of these sections more and it may be a good idea to create separate pages.

@manuelh-dev manuelh-dev left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left just a few comments, feel free to resolve these if these don't seem immediately helpful

@a-mccarthy

Copy link
Copy Markdown
Collaborator Author

@manuelh-dev i made some more updates to this PR. Do you have time this week to review?

Updates in cluding

  • a new index home page
  • a new persona page to go over users and their resonsbilities
  • Added a troubleshooting page (which is still a bit of a draft)
  • Added a quick start, which cuts down on the commands/details to just installing kata + gpu operator. Do you think this type of install adds value to users?

@mikemckiernan mikemckiernan left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Def a good idea to provide a streamlined, common install page. LMK what gibberish I can clarify.

Comment thread confidential-containers/attestation.rst Outdated
Deploy Confidential Containers
******************************
#########################################
Install Guide for Confidential Containers

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Def better than what I had and requires differentiation from the quickstart approach. I don't think the title is wrong, but I'm wondering if it can be more of a contrast to quickstart.

  • Detailed Installation
  • Common Installation Options (might be untrue)
  • Traditional Workload Considerations

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to Detailed Install Guide

Refer to the :doc:`NVIDIA GPU Operator <gpuop:overview>` and `Kata Containers <https://katacontainers.io/docs/>`_ documentation for more information on these software components.
Refer to the `Kubernetes documentation <https://kubernetes.io/docs/home/>`_ for more information on Kubernetes cluster administration.
#. :doc:`Prerequisites <prerequisites>`.
#. :ref:`Label nodes for Confidential Containers components <coco-label-nodes>`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not-sure: I wonder if "Label nodes to install Confidential Containers components" could set expectations for why we're labelling nodes. Or, "Label the nodes to configure for Confidential Containers"?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the suggestions on this section! i updated the wording here to hopefully be less clunky

Comment thread confidential-containers/confidential-containers-deploy.rst Outdated
Comment thread confidential-containers/confidential-containers-deploy.rst Outdated
Comment on lines +32 to +44
You can set the default confidential computing mode of the NVIDIA GPUs by setting the ``ccManager.defaultMode=<on|off>`` option.
The default value of ``ccManager.defaultMode`` is ``on``.
You can set this option when you install NVIDIA GPU Operator or afterward by modifying the cluster-policy instance of the ClusterPolicy object.

Set a node-level mode by applying the ``nvidia.com/cc.mode=<on|off|ppcie>`` label on the node.
If you set a specific mode on a node, it has higher precedence than the cluster-wide default mode.

When you change the mode, the manager performs the following actions:

* Evicts the other GPU Operator operands from the node.
However, the manager does not drain user workloads. You must make sure that no user workloads are running on the node before you change the mode.
* Changes the mode and resets the GPU.
* Reschedules the other GPU Operator operands.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this info could follow the table or if it can be removed if it is redundant with the info in the sections that follow. You likely inherited some verbosity from my content.

Comment thread confidential-containers/configure-workloads.rst
Comment thread confidential-containers/configure-multi-gpu.rst Outdated
Comment thread confidential-containers/index.rst Outdated
Comment thread confidential-containers/index.rst Outdated
Comment thread confidential-containers/attestation.rst Outdated
Complete the **Install** section (through :doc:`Run a Sample Workload <run-sample-workload>` with ``Test PASSED``) before wiring attestation into production workloads.

Attestation is not required for the install sample workload.
Configure attestation when workloads need secrets, encrypted container images, or authenticated registries.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fitzthum do we care about authenticated registries?

should we generally formulate this more broadly? Every deployment should need attestation. Is there value in the solution when not conducting attestation?

Comment thread confidential-containers/prerequisites.rst Outdated
Comment thread confidential-containers/index.rst Outdated
Comment thread confidential-containers/attestation.rst Outdated
Attestation
***********

As a :ref:`Security Engineer <coco-persona-security-engineer>`, use this page to configure and verify attestation for confidential workloads.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change the scope here and clearly delimit what this page does and what not. We should emphasize that attestation is required but that this is out of scope for this page, and instead describe that this page explains how to get to a basic setup of trustee and kbs-client for evaluation purposes. The workload etc. needs to be configured for attestation, so our goal is to not provide an end-to-end sample

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has been updated. I also added a Using this documentation section to the index page that calls out right from the start that we only deal with nvidia specific info.

Comment thread confidential-containers/attestation.rst
Comment thread confidential-containers/attestation.rst Outdated
Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants