Skip to content

feat(agents): add nemo agents package container packaging command#14

Open
arpitsardhana wants to merge 1 commit into
mainfrom
aircore-4-package-plugin/arpsingh
Open

feat(agents): add nemo agents package container packaging command#14
arpitsardhana wants to merge 1 commit into
mainfrom
aircore-4-package-plugin/arpsingh

Conversation

@arpitsardhana
Copy link
Copy Markdown
Contributor

@arpitsardhana arpitsardhana commented May 21, 2026

Introduces a render → validate → build → publish pipeline for packaging NAT agents as OCI container images, driven by flags on a single command.

  • New container/ module: Jinja2 Dockerfile renderer (template.py), python-on-whales builder (builder.py) and publisher (publisher.py), structural config validator (validator.py), OCI metadata extractor (metadata.py), and a sample sandbox policy (openshell_policy.yaml).
  • CLI: registers nemo agents package with --no-build, --publish, --registry, --format docker|whl, --platform, plus reproducibility knobs (--nat-version, --base-image-*, --python-version) and security defaults (--allow-root opt-out, generated .dockerignore).
  • pyproject: declares [container] optional extra (jinja2, python-on-whales) and adds jinja2 to the test extra.
  • Tests: 84 unit tests covering render, build, publish, validate, metadata, and CLI flag-combination handling.
  • Example: minimal react_agent workflow (examples/hello_world.yaml).

Summary by CodeRabbit

  • New Features

    • Added nemo agents package CLI command to containerize agents with render, validate, build, and optional publish workflow
    • Agent configuration validation before containerization
    • Support for multiple registry targets and image publishing
    • Example agent configuration provided
  • Documentation

    • Updated plugin README with agent packaging guide and CLI reference documentation
    • Added example Docker configuration and sample agent

Review Change Stack

@arpitsardhana arpitsardhana self-assigned this May 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 21, 2026

Suite Lines Covered Line Rate Branch Rate
Unit Tests 18290/24245 75.4% 61.9%
Integration Tests 11693/23022 50.8% 26.0%

@arpitsardhana arpitsardhana force-pushed the aircore-4-package-plugin/arpsingh branch 2 times, most recently from 1812b43 to bba096d Compare May 21, 2026 23:02
@arpitsardhana arpitsardhana marked this pull request as ready for review May 21, 2026 23:59
@arpitsardhana arpitsardhana requested review from a team as code owners May 21, 2026 23:59
marcusds added a commit that referenced this pull request May 27, 2026
- scripts/git-utils.ts openBrowser: parse URL with `new URL()` and require
  http/https before spawning. Replace the Windows `cmd /c start` shell
  invocation with `rundll32 url.dll,FileProtocolHandler` so no branch goes
  through a shell. Pass `--` separator on darwin/linux so a URL starting
  with `-` cannot be parsed as an option. Closes #3951.
- sdk/orval/generate.ts: delete the unused HTTP-fetch branch from
  `getFile()`. All current `serviceConfigs` reference local YAML paths, so
  the network->file write CodeQL flagged on line 131 (#14) no longer
  exists. Throws a clear error if a remote URL is configured.

Signed-off-by: mschwab <mschwab@nvidia.com>
github-merge-queue Bot pushed a commit that referenced this pull request May 28, 2026
* fix: address open CodeQL alerts in TypeScript code

Close 21 open CodeQL alerts on main:

Security
- LargeFileWorker: remove dead `download` (untrusted-URL fetch) and `upload`
  actions; only `downloadAsFile` (SDK path-based) is used by callers. Closes
  #4 (client-side-request-forgery) and #17 (missing-origin-check).
- orval/generate.ts: use `fs.mkdtempSync` for the OpenAPI spec temp file
  instead of a predictable `os.tmpdir()` path. Closes #5
  (insecure-temporary-file).

Code-quality
- Drop redundant `this.page = page` / `this.request = request` in 11
  e2e-tests classes — TS parameter properties (`public readonly page: Page`,
  `private request: APIRequestContext`) already assign the field. Closes
  #22-#32 (useless-assignment-to-property).
- Drop redundant null/undefined checks after narrowing in
  ReportTraceModal/utils, BenchmarkDetailsPanel, api/intake/utils,
  ActionMenu, useSubmitICLsFile. Closes #33-#37.
- SafeSynthesizerJobReportRoute/util: drop unreachable `else if (score >= 8)`
  branches and the dead `UNAVAILABLE` fallback; add explicit
  `Number.isNaN` guard at the top of each grading helper. Closes #20, #21.
- WorkspaceDashboardRoute: drop inner `MODEL_COMPARE_ENABLED ? a : b`
  ternary that always picked `a` (lives inside an outer
  `MODEL_COMPARE_ENABLED &&` guard); drop now-unused
  `getWorkspaceBaseModelsRoute` import. Closes #19.

Signed-off-by: mschwab <mschwab@nvidia.com>

* fix: refactor remaining CodeQL-flagged build scripts to argv form

Drop shell interpolation in dev/build scripts so user-supplied branch names,
commit hashes, paths, and env values cannot be parsed as shell syntax. Also
plug a TOCTOU and add origin allowlists for the two http-to-file fetches.

- scripts/cherry-pick.ts: route every git call through execFileSync('git',
  [...]). Closes #6-#10 (indirect-cmd-line-injection).
- scripts/git-utils.ts: openBrowser uses execFile + argv array; status/branch
  helpers use execFileSync with argv. Removes the brittle " → \" escape and
  the shell-interpolated browser command. Closes #1
  (incomplete-sanitization) and #11 (indirect-cmd-line-injection).
- sdk/orval/format-generated.ts: prettier runs via execFileSync. Closes #2
  (shell-cmd-injection-from-env) and #13 (indirect-cmd-line-injection).
- sdk/orval/generate.ts: orval runs via execFileSync, with its parameters
  passed in env instead of interpolated into a shell string; remote spec
  fetches are restricted to an allowlist of github/gitlab hosts; the
  existsSync+readFileSync TOCTOU in postProcessZodFiles is collapsed into a
  single try/catch on ENOENT. Closes #3 (file-system-race), #12
  (indirect-cmd-line-injection), and #14 (http-to-file-access).
- studio/scripts/fetch-styles.ts: validate that the fetch URL hostname
  matches the configured Kaizen CDN before fetching. Closes #15
  (http-to-file-access).

Signed-off-by: mschwab <mschwab@nvidia.com>

* fix: close remaining CodeQL alerts re-emitted on PR scan

- scripts/git-utils.ts openBrowser: parse URL with `new URL()` and require
  http/https before spawning. Replace the Windows `cmd /c start` shell
  invocation with `rundll32 url.dll,FileProtocolHandler` so no branch goes
  through a shell. Pass `--` separator on darwin/linux so a URL starting
  with `-` cannot be parsed as an option. Closes #3951.
- sdk/orval/generate.ts: delete the unused HTTP-fetch branch from
  `getFile()`. All current `serviceConfigs` reference local YAML paths, so
  the network->file write CodeQL flagged on line 131 (#14) no longer
  exists. Throws a clear error if a remote URL is configured.

Signed-off-by: mschwab <mschwab@nvidia.com>

* fix: drop -- separator for xdg-open

xdg-open does not honor -- as an option terminator; passing it as an arg
caused openBrowser to fail on Linux. URL is already validated to http(s),
so the separator wasn't load-bearing — just drop it on the Linux branch.

Codex review on PR #75.

Signed-off-by: mschwab <mschwab@nvidia.com>

* fix: address CodeRabbit findings on PR #75

- scripts/git-utils.ts: drop `--` from macOS `open` argv too. `open`'s man
  page does not document `--` as an end-of-options separator. URL is
  already validated to http(s), so the separator wasn't load-bearing.
- sdk/orval/format-generated.ts: on Windows, run prettier through
  `cmd.exe /c` so the `prettier.cmd` shim resolves. `execFileSync` on
  Windows cannot launch .cmd shims directly.
- sdk/orval/generate.ts: same Windows wrap for `pnpm exec orval`.

Signed-off-by: mschwab <mschwab@nvidia.com>

* fix: validate format-generated.ts servicePath argv

The Windows cmd.exe /c wrap added in ec7aa93 re-opened a CodeQL
data-flow finding (#3961, #3962) because generatedPath traces back to
process.argv[2]. Validate the argv against a safe-char regex at entry
so CodeQL sees it as sanitized before it flows into argv or paths.

Signed-off-by: mschwab <mschwab@nvidia.com>

* fix: replace regex with hardcoded Set allowlist for servicePath

CodeQL did not recognize the regex check as a sanitizer; switching to a
hardcoded Set lookup against known serviceConfigs paths so the data flow
is reducible to a finite set of literal values.

Signed-off-by: mschwab <mschwab@nvidia.com>

* fix: use prettier Node API instead of subprocess

Replace the prettier CLI invocation with prettier's programmatic
format/resolveConfig/getFileInfo API. No subprocess means no
cmd.exe wrap, no command-line argument flow, and the CodeQL
indirect-command-line-injection / shell-cmd-injection-from-env
alerts on format-generated.ts can resolve. Also fixes the Windows
.cmd shim resolution problem CR raised, since prettier now runs
in-process.

The servicePath argv is still validated against a hardcoded Set
of known serviceConfigs paths to prevent directory traversal via
path.join.

Signed-off-by: mschwab <mschwab@nvidia.com>

* fix: use readdirSync withFileTypes to avoid statSync TOCTOU

CodeQL flagged the statSync -> readFileSync / writeFileSync pair in
formatWithPrettier as a file-system-race. Getting Dirent entries from
readdirSync(dir, { withFileTypes: true }) lets us check isDirectory /
isFile inline without a separate stat round-trip, closing the alert.

Signed-off-by: mschwab <mschwab@nvidia.com>

* fix: drop remaining statSync usages in format-generated.ts

Codex flagged that getTsFiles and splitZodTagFilesIn still used the
readdir-string + statSync pattern, leaving two more file-system-race
sinks even after formatWithPrettier was converted. Switch both to
readdirSync(dir, { withFileTypes: true }) and use Dirent.isFile() /
isDirectory() inline. Removes the last statSync from this script.

Signed-off-by: mschwab <mschwab@nvidia.com>

---------

Signed-off-by: mschwab <mschwab@nvidia.com>
aray12 pushed a commit that referenced this pull request May 28, 2026
* fix: address open CodeQL alerts in TypeScript code

Close 21 open CodeQL alerts on main:

Security
- LargeFileWorker: remove dead `download` (untrusted-URL fetch) and `upload`
  actions; only `downloadAsFile` (SDK path-based) is used by callers. Closes
  #4 (client-side-request-forgery) and #17 (missing-origin-check).
- orval/generate.ts: use `fs.mkdtempSync` for the OpenAPI spec temp file
  instead of a predictable `os.tmpdir()` path. Closes #5
  (insecure-temporary-file).

Code-quality
- Drop redundant `this.page = page` / `this.request = request` in 11
  e2e-tests classes — TS parameter properties (`public readonly page: Page`,
  `private request: APIRequestContext`) already assign the field. Closes
  #22-#32 (useless-assignment-to-property).
- Drop redundant null/undefined checks after narrowing in
  ReportTraceModal/utils, BenchmarkDetailsPanel, api/intake/utils,
  ActionMenu, useSubmitICLsFile. Closes #33-#37.
- SafeSynthesizerJobReportRoute/util: drop unreachable `else if (score >= 8)`
  branches and the dead `UNAVAILABLE` fallback; add explicit
  `Number.isNaN` guard at the top of each grading helper. Closes #20, #21.
- WorkspaceDashboardRoute: drop inner `MODEL_COMPARE_ENABLED ? a : b`
  ternary that always picked `a` (lives inside an outer
  `MODEL_COMPARE_ENABLED &&` guard); drop now-unused
  `getWorkspaceBaseModelsRoute` import. Closes #19.

Signed-off-by: mschwab <mschwab@nvidia.com>

* fix: refactor remaining CodeQL-flagged build scripts to argv form

Drop shell interpolation in dev/build scripts so user-supplied branch names,
commit hashes, paths, and env values cannot be parsed as shell syntax. Also
plug a TOCTOU and add origin allowlists for the two http-to-file fetches.

- scripts/cherry-pick.ts: route every git call through execFileSync('git',
  [...]). Closes #6-#10 (indirect-cmd-line-injection).
- scripts/git-utils.ts: openBrowser uses execFile + argv array; status/branch
  helpers use execFileSync with argv. Removes the brittle " → \" escape and
  the shell-interpolated browser command. Closes #1
  (incomplete-sanitization) and #11 (indirect-cmd-line-injection).
- sdk/orval/format-generated.ts: prettier runs via execFileSync. Closes #2
  (shell-cmd-injection-from-env) and #13 (indirect-cmd-line-injection).
- sdk/orval/generate.ts: orval runs via execFileSync, with its parameters
  passed in env instead of interpolated into a shell string; remote spec
  fetches are restricted to an allowlist of github/gitlab hosts; the
  existsSync+readFileSync TOCTOU in postProcessZodFiles is collapsed into a
  single try/catch on ENOENT. Closes #3 (file-system-race), #12
  (indirect-cmd-line-injection), and #14 (http-to-file-access).
- studio/scripts/fetch-styles.ts: validate that the fetch URL hostname
  matches the configured Kaizen CDN before fetching. Closes #15
  (http-to-file-access).

Signed-off-by: mschwab <mschwab@nvidia.com>

* fix: close remaining CodeQL alerts re-emitted on PR scan

- scripts/git-utils.ts openBrowser: parse URL with `new URL()` and require
  http/https before spawning. Replace the Windows `cmd /c start` shell
  invocation with `rundll32 url.dll,FileProtocolHandler` so no branch goes
  through a shell. Pass `--` separator on darwin/linux so a URL starting
  with `-` cannot be parsed as an option. Closes #3951.
- sdk/orval/generate.ts: delete the unused HTTP-fetch branch from
  `getFile()`. All current `serviceConfigs` reference local YAML paths, so
  the network->file write CodeQL flagged on line 131 (#14) no longer
  exists. Throws a clear error if a remote URL is configured.

Signed-off-by: mschwab <mschwab@nvidia.com>

* fix: drop -- separator for xdg-open

xdg-open does not honor -- as an option terminator; passing it as an arg
caused openBrowser to fail on Linux. URL is already validated to http(s),
so the separator wasn't load-bearing — just drop it on the Linux branch.

Codex review on PR #75.

Signed-off-by: mschwab <mschwab@nvidia.com>

* fix: address CodeRabbit findings on PR #75

- scripts/git-utils.ts: drop `--` from macOS `open` argv too. `open`'s man
  page does not document `--` as an end-of-options separator. URL is
  already validated to http(s), so the separator wasn't load-bearing.
- sdk/orval/format-generated.ts: on Windows, run prettier through
  `cmd.exe /c` so the `prettier.cmd` shim resolves. `execFileSync` on
  Windows cannot launch .cmd shims directly.
- sdk/orval/generate.ts: same Windows wrap for `pnpm exec orval`.

Signed-off-by: mschwab <mschwab@nvidia.com>

* fix: validate format-generated.ts servicePath argv

The Windows cmd.exe /c wrap added in ec7aa93 re-opened a CodeQL
data-flow finding (#3961, #3962) because generatedPath traces back to
process.argv[2]. Validate the argv against a safe-char regex at entry
so CodeQL sees it as sanitized before it flows into argv or paths.

Signed-off-by: mschwab <mschwab@nvidia.com>

* fix: replace regex with hardcoded Set allowlist for servicePath

CodeQL did not recognize the regex check as a sanitizer; switching to a
hardcoded Set lookup against known serviceConfigs paths so the data flow
is reducible to a finite set of literal values.

Signed-off-by: mschwab <mschwab@nvidia.com>

* fix: use prettier Node API instead of subprocess

Replace the prettier CLI invocation with prettier's programmatic
format/resolveConfig/getFileInfo API. No subprocess means no
cmd.exe wrap, no command-line argument flow, and the CodeQL
indirect-command-line-injection / shell-cmd-injection-from-env
alerts on format-generated.ts can resolve. Also fixes the Windows
.cmd shim resolution problem CR raised, since prettier now runs
in-process.

The servicePath argv is still validated against a hardcoded Set
of known serviceConfigs paths to prevent directory traversal via
path.join.

Signed-off-by: mschwab <mschwab@nvidia.com>

* fix: use readdirSync withFileTypes to avoid statSync TOCTOU

CodeQL flagged the statSync -> readFileSync / writeFileSync pair in
formatWithPrettier as a file-system-race. Getting Dirent entries from
readdirSync(dir, { withFileTypes: true }) lets us check isDirectory /
isFile inline without a separate stat round-trip, closing the alert.

Signed-off-by: mschwab <mschwab@nvidia.com>

* fix: drop remaining statSync usages in format-generated.ts

Codex flagged that getTsFiles and splitZodTagFilesIn still used the
readdir-string + statSync pattern, leaving two more file-system-race
sinks even after formatWithPrettier was converted. Switch both to
readdirSync(dir, { withFileTypes: true }) and use Dirent.isFile() /
isDirectory() inline. Removes the last statSync from this script.

Signed-off-by: mschwab <mschwab@nvidia.com>

---------

Signed-off-by: mschwab <mschwab@nvidia.com>
Signed-off-by: Alex Ray <alray@nvidia.com>
@arpitsardhana arpitsardhana force-pushed the aircore-4-package-plugin/arpsingh branch from bba096d to a13d1e5 Compare May 28, 2026 22:13
else:
template_source = DOCKERFILE_TEMPLATE

template = jinja2.Template(template_source)
)
if result.returncode == 0 and result.stdout.strip():
return result.stdout.strip()
except (FileNotFoundError, subprocess.TimeoutExpired):
data = yaml.safe_load(config_text)
if isinstance(data, dict) and "workflow" in data:
return "nemo_agent_toolkit"
except Exception:
wf = data.get("workflow", {})
if isinstance(wf, dict) and wf.get("_type"):
return f"{wf['_type']} agent"
except Exception:
)
if result.returncode == 0 and result.stdout.strip():
return result.stdout.strip()
except (FileNotFoundError, subprocess.TimeoutExpired):
)
if result.returncode == 0 and result.stdout.strip():
return _strip_credentials(result.stdout.strip())
except (FileNotFoundError, subprocess.TimeoutExpired):
@arpitsardhana arpitsardhana force-pushed the aircore-4-package-plugin/arpsingh branch from a13d1e5 to da18709 Compare May 28, 2026 22:18
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

📝 Walkthrough

Walkthrough

This PR adds agent container packaging as a first-class nemo agents package CLI command. It includes metadata extraction, YAML validation, Dockerfile templating, Docker build/publish orchestration, complete documentation, examples, and a 1300-line test suite.

Changes

NeMo Agents Container Packaging

Layer / File(s) Summary
Agent metadata extraction and YAML validation
src/nemo_agents_plugin/container/metadata.py, src/nemo_agents_plugin/container/validator.py, tests/unit/test_container.py
Public extract_agent_metadata() resolves OCI labels from agent config + optional pyproject (name, version, author, framework, agent_id, revision, source with credential stripping). Public validate_agent_config() enforces YAML structure, workflow type allowlist, and tool/LLM references.
Dockerfile and .dockerignore templating
src/nemo_agents_plugin/container/template.py, tests/unit/test_container.py
RenderParams dataclass holds all template variables. resolve_value() picks explicit arguments, environment variables, or defaults. render_dockerfile() supports project vs config-only modes, external Jinja2 templates, and metadata integration. render_dockerignore() writes built-in security patterns.
Docker image build orchestration
src/nemo_agents_plugin/container/builder.py, tests/unit/test_container.py
docker_build() wraps python-on-whales image build. build_agent_image() orchestrates validation, context selection, Dockerfile rendering or passthrough, build invocation, default tag computation via extract_agent_metadata(), and artifact cleanup.
Docker registry publishing
src/nemo_agents_plugin/container/publisher.py, tests/unit/test_container.py
docker_push() tags local image and pushes to registry with configurable remote tag computation and failure handling.
CLI package command
src/nemo_agents_plugin/cli.py, tests/unit/test_container.py
nemo agents package command wires render → validate → build → (optional) publish pipeline with fail-fast flag validation, render-only --no-build mode, NAT version unpinned warning, and external template support.
Documentation, examples, dependencies
README.md, examples/Dockerfile, examples/hello_world.yaml, examples/.dockerignore, pyproject.toml
README documents packaging workflow and comprehensive CLI flag reference. Example Dockerfile hardened with non-root user and uv-managed Python. hello_world.yaml demonstrates tool, LLM, and workflow configuration. Adds jinja2>=3.1 and python-on-whales>=0.60 to container extra; extends test extra.
Test suite
tests/unit/test_container.py
1300+ lines: render modes and overrides, parameter resolution, metadata extraction with git integration and credential stripping, YAML validation (success and negative cases), build orchestration with mocked docker, docker push tag computation, end-to-end pipeline sequencing, and CLI command flag validation including --no-build, publish behavior, and format enforcement.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 29.66% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main addition: a new nemo agents package command for container packaging.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch aircore-4-package-plugin/arpsingh

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🧹 Nitpick comments (1)
plugins/nemo-agents/README.md (1)

231-492: 🏗️ Heavy lift

Split this section by Diataxis and add “Next Steps”.

This section mixes HOW-TO walkthroughs, REFERENCE tables, and EXPLANATION in one page. Move reference tables to a dedicated reference doc, keep this section task-oriented, and add a Next Steps cross-link section at the end. As per coding guidelines, "Each documentation page should fit ONE Diataxis quadrant; do not mix tutorials with reference tables or how-tos with architecture explanations; use cross-links instead" and "Include 'Next Steps' section at the end with cross-links to related documentation content".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/README.md` around lines 231 - 492, This section mixes
HOW-TO walkthroughs, reference tables, and explanatory material; split it by
Diataxis so this page becomes a task-oriented "How-to: package agents" (keep the
progressive pipeline walkthroughs, render/build/publish examples, full example,
and brief rendering modes explanation needed for the task) and move all
reference material (the entire "Flag reference" tables, "Image tagging
convention", "OCI image labels", "Security defaults", and the detailed
"Rendering modes" table) into a separate reference doc (e.g., "Packaging
reference"); then add a short "Next Steps" section at the end of this how-to
that links to the new packaging reference, agent config validation docs, and any
template/pyproject docs for further reading.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plugins/nemo-agents/agent_container_commands_3964e748.plan.md`:
- Line 212: The Markdown file has a broken relative link target using
"nmp/plugins/..." instead of the correct repository path
"plugins/nemo-agents/..."; update the link in
plugins/nemo-agents/agent_container_commands_3964e748.plan.md to point to the
correct path (replace "nmp/plugins/nemo-agents/..." with
"plugins/nemo-agents/...") so references to the AgentsCLI help panel and package
subcommand resolve correctly.
- Line 251: A few fenced code blocks lack a language specifier and required
surrounding blank lines; add an appropriate language token after the opening ```
for the fence currently written as ``` at Line 251, and ensure there is a blank
line before each opening fence and a blank line after each closing fence for the
code blocks spanning the regions around Lines 444–449 and 452–463 so they
satisfy markdownlint (i.e., replace ``` with ```<language> and insert empty
lines immediately before and after those fenced blocks).

In `@plugins/nemo-agents/nat_agent_build_requirements.md`:
- Line 318: Update the installation sample to match the implemented extras set:
change the pip install requirement that currently uses
"nvidia-nat[all]==${NAT_VERSION}" to use "nvidia-nat[most]==${NAT_VERSION}" so
it aligns with the template and build flow; locate the string
"nvidia-nat[all]==${NAT_VERSION}" in nat_agent_build_requirements.md and replace
the extras token "[all]" with "[most]".
- Around line 12-17: Update this spec to reflect the actual single-command
workflow: replace all references to the deprecated lifecycle commands 'nemo
agents render', 'nemo agents build', and 'nemo agents publish' with the shipped
'nemo agents package' command and adjust descriptions to explain that 'nemo
agents package' validates, renders (if needed), builds, and optionally
publishes; remove or consolidate duplicated sections that describe the old
multi-command flow (notably the blocks around the shown table and the other
occurrences you flagged) so the document only documents 'nemo agents package' as
the primary lifecycle command.
- Line 107: Several fenced code blocks in nat_agent_build_requirements.md are
missing language tags (violating MD040); edit each triple-backtick block and add
the appropriate language identifier (e.g., bash, text, json) after the opening
``` so the markdown linter recognizes the language—update every occurrence of
bare ``` in the file (the blocks called out in the review) to use a specific
language tag.

In `@plugins/nemo-agents/README.md`:
- Around line 356-357: The README table lists incorrect default values for
`--nat-version` and `--output`; update the table to reflect actual runtime
behavior implemented in _warn_if_nat_version_unpinned and _package_render_only:
state that `--nat-version` falls back to an internal default when $NAT_VERSION
is not set and emits a warning (rather than simply being `$NAT_VERSION`), and
that `--output` defaults to the pyproject parent directory in project mode (not
always `<config-dir>/Dockerfile`), while documenting the render-only/--no-build
behavior as implemented in _package_render_only.
- Around line 267-270: Add missing fenced-code block language specifiers (e.g.,
```text or ```bash) and ensure each fenced block has a blank line before and
after to satisfy MD040/MD031; specifically update the blocks containing the
lines "Dockerfile written to examples/Dockerfile" and ".dockerignore written to
examples/.dockerignore" and the other blocks referenced at ranges 282-286,
299-307, and 403-405 in the README so every ``` fence includes a language token
and is separated by surrounding blank lines.

In `@plugins/nemo-agents/src/nemo_agents_plugin/cli.py`:
- Around line 275-279: The CLI's platform Option (variable platform) is declared
but not actually used, allowing the publish path to report success without any
push; update the command handler that performs the build/publish (the function
using the platform variable around the publish logic) to either pass platform
into the buildx/publish routine or short-circuit with an explicit error when
platform is provided but buildx wiring isn't implemented; ensure the guard is
applied in both places noted (the platform declaration usage and the duplicate
spots around lines 400-404) so the CLI returns a non-zero error if platform is
requested but no push/publish occurs.
- Around line 438-444: The CLI currently validates --format and --agent-whl but
always proceeds down the Docker build path; modify the build command handler
around the validation (where the variables format and agent_whl are checked) to
branch on format == "whl": when true, invoke the wheel build flow using
agent_whl (e.g., call or create a function like build_whl_package(agent_whl) or
reuse existing wheel packaging helpers) and skip any Docker-only steps; ensure
the whl branch uses agent_whl as the source path, exits with errors on
missing/invalid paths, and only runs the Docker build logic when format ==
"docker".

In `@plugins/nemo-agents/src/nemo_agents_plugin/container/builder.py`:
- Around line 155-167: The current flow can remove a user's existing
.dockerignore because render_dockerignore always returns a path that is unlinked
in the finally block; change the logic so you only unlink the .dockerignore if
we created it. Before calling render_dockerignore (in the block guarded by
generate_ignore) record whether a .dockerignore already existed (e.g. check
Path(context_dir / ".dockerignore").exists()) or change render_dockerignore to
return a (path, created_flag); then only call
ignore_file.unlink(missing_ok=True) in the finally when the created_flag is True
(or when the pre-check showed the file did not previously exist). Keep
tmp_dockerfile cleanup unchanged.

In `@plugins/nemo-agents/src/nemo_agents_plugin/container/template.py`:
- Around line 266-272: The current code in template.py silently falls back to a
non-build-context path when agent_config is outside the project root (in the
has_pyproject branch); change the except ValueError handler to fail fast by
raising a clear error (e.g., raise ValueError or RuntimeError) that includes
agent_config and pyproject paths rather than assigning Path(agent_config.name)
to relative_config, so that config_file_path is not set to a bogus
"/workspace/..." value; update the try/except around
agent_config.resolve().relative_to(pyproject.resolve().parent) accordingly
(references: has_pyproject, pyproject, agent_config, relative_config,
config_file_path).

In `@plugins/nemo-agents/src/nemo_agents_plugin/container/validator.py`:
- Around line 70-74: The validator currently lets configs with a workflow but no
"_type" pass; update the check in validator.py to treat a missing or empty
workflow["_type"] as an error: read wf_type = workflow.get("_type", "") and if
not wf_type append an error like "Missing required workflow._type. Expected one
of: ..." (use _KNOWN_WORKFLOW_TYPES to list allowed values), otherwise keep the
existing unknown-type check that appends to errors when wf_type not in
_KNOWN_WORKFLOW_TYPES.
- Around line 47-50: The current try/except in validate_agent_config silently
treats missing PyYAML as valid; change the ImportError handling in
validate_agent_config to return a ValidationResult with valid=False and an
explanatory error (e.g., "PyYAML not installed: <error>") instead of
ValidationResult(valid=True,...), so callers know YAML parsing/validation
couldn't run; keep using the ValidationResult class and include the ImportError
message in the errors list to aid debugging.

In `@plugins/nemo-agents/tests/unit/test_container.py`:
- Line 1109: The test method's parameter type hint is written as a string
("tuple[Path, Path]"); remove the quotes so the annotation is a concrete type
tuple[Path, Path] (e.g., change the signature that contains package_cli,
project_dir: "tuple[Path, Path]" to use package_cli, project_dir: tuple[Path,
Path]) to comply with concrete type-hint guidelines and ensure tuple and Path
are used directly.

---

Nitpick comments:
In `@plugins/nemo-agents/README.md`:
- Around line 231-492: This section mixes HOW-TO walkthroughs, reference tables,
and explanatory material; split it by Diataxis so this page becomes a
task-oriented "How-to: package agents" (keep the progressive pipeline
walkthroughs, render/build/publish examples, full example, and brief rendering
modes explanation needed for the task) and move all reference material (the
entire "Flag reference" tables, "Image tagging convention", "OCI image labels",
"Security defaults", and the detailed "Rendering modes" table) into a separate
reference doc (e.g., "Packaging reference"); then add a short "Next Steps"
section at the end of this how-to that links to the new packaging reference,
agent config validation docs, and any template/pyproject docs for further
reading.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 00451055-1213-4707-8e75-278347f6f255

📥 Commits

Reviewing files that changed from the base of the PR and between 397e25e and a13d1e5.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (15)
  • plugins/nemo-agents/README.md
  • plugins/nemo-agents/agent_container_commands_3964e748.plan.md
  • plugins/nemo-agents/examples/.dockerignore
  • plugins/nemo-agents/examples/Dockerfile
  • plugins/nemo-agents/examples/hello_world.yaml
  • plugins/nemo-agents/nat_agent_build_requirements.md
  • plugins/nemo-agents/pyproject.toml
  • plugins/nemo-agents/src/nemo_agents_plugin/cli.py
  • plugins/nemo-agents/src/nemo_agents_plugin/container/builder.py
  • plugins/nemo-agents/src/nemo_agents_plugin/container/metadata.py
  • plugins/nemo-agents/src/nemo_agents_plugin/container/openshell_policy.yaml
  • plugins/nemo-agents/src/nemo_agents_plugin/container/publisher.py
  • plugins/nemo-agents/src/nemo_agents_plugin/container/template.py
  • plugins/nemo-agents/src/nemo_agents_plugin/container/validator.py
  • plugins/nemo-agents/tests/unit/test_container.py


## Context

The existing `AgentsCLI` in [`plugins/nemo-agents/src/nemo_agents_plugin/cli.py`](nmp/plugins/nemo-agents/src/nemo_agents_plugin/cli.py) registers local commands (`invoke`, `run`, `evaluate`) and platform commands (`create`, `deploy`, etc.). We add a new **"Packaging (no platform required)"** help panel with a single `package` subcommand that encapsulates the render → validate → build → publish pipeline.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix broken source-code link target.

The relative link points to nmp/plugins/..., which is inconsistent with repository paths used elsewhere (plugins/nemo-agents/...) and will likely 404 in rendered docs.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/agent_container_commands_3964e748.plan.md` at line 212,
The Markdown file has a broken relative link target using "nmp/plugins/..."
instead of the correct repository path "plugins/nemo-agents/..."; update the
link in plugins/nemo-agents/agent_container_commands_3964e748.plan.md to point
to the correct path (replace "nmp/plugins/nemo-agents/..." with
"plugins/nemo-agents/...") so references to the AgentsCLI help panel and package
subcommand resolve correctly.


## 1. The `package` Subcommand

```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix markdownlint fenced-block issues.

Add missing language on the fence at Line 251 and add required blank lines around fenced blocks near Lines 444 and 452.

Also applies to: 444-449, 452-463

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 251-251: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/agent_container_commands_3964e748.plan.md` at line 251, A
few fenced code blocks lack a language specifier and required surrounding blank
lines; add an appropriate language token after the opening ``` for the fence
currently written as ``` at Line 251, and ensure there is a blank line before
each opening fence and a blank line after each closing fence for the code blocks
spanning the regions around Lines 444–449 and 452–463 so they satisfy
markdownlint (i.e., replace ``` with ```<language> and insert empty lines
immediately before and after those fenced blocks).

Comment on lines +12 to +17
| Command | Purpose |
|---------|---------|
| `nemo agents render` | Render a Dockerfile, `.dockerignore`, and optionally OpenShell policy from agent config |
| `nemo agents build` | Validate agent content and build a Docker image (auto-renders if no Dockerfile provided) |
| `nemo agents publish` | Tag and push the built image to a remote container registry |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

This spec documents removed commands instead of the shipped package command.

The file still defines nemo agents render/build/publish as primary lifecycle commands, which conflicts with the implemented single-command workflow (nemo agents package). This will mislead users and downstream docs/tests.

Also applies to: 146-218, 346-362, 368-373

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/nat_agent_build_requirements.md` around lines 12 - 17,
Update this spec to reflect the actual single-command workflow: replace all
references to the deprecated lifecycle commands 'nemo agents render', 'nemo
agents build', and 'nemo agents publish' with the shipped 'nemo agents package'
command and adjust descriptions to explain that 'nemo agents package' validates,
renders (if needed), builds, and optionally publishes; remove or consolidate
duplicated sections that describe the old multi-command flow (notably the blocks
around the shown table and the other occurrences you flagged) so the document
only documents 'nemo agents package' as the primary lifecycle command.


## Example Agent Directory Structure

```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix missing language tags on fenced blocks.

These fenced code blocks violate MD040 and should declare languages (text, bash, etc.).

Also applies to: 152-152, 187-187, 208-208, 268-268

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 107-107: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/nat_agent_build_requirements.md` at line 107, Several
fenced code blocks in nat_agent_build_requirements.md are missing language tags
(violating MD040); edit each triple-backtick block and add the appropriate
language identifier (e.g., bash, text, json) after the opening ``` so the
markdown linter recognizes the language—update every occurrence of bare ``` in
the file (the blocks called out in the review) to use a specific language tag.

uv venv --python ${PYTHON_VERSION} /workspace/.venv && \
. /workspace/.venv/bin/activate && \
test -n "${NAT_VERSION}" || { echo "NAT_VERSION build-arg is required" >&2; exit 1; } && \
uv pip install --prerelease=allow "nvidia-nat[all]==${NAT_VERSION}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

nvidia-nat[all] is inconsistent with implementation and likely invalid for users.

The template and build flow use nvidia-nat[most]; this sample uses [all]. Align this requirement/sample to [most] to avoid install failures and contract drift.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/nat_agent_build_requirements.md` at line 318, Update the
installation sample to match the implemented extras set: change the pip install
requirement that currently uses "nvidia-nat[all]==${NAT_VERSION}" to use
"nvidia-nat[most]==${NAT_VERSION}" so it aligns with the template and build
flow; locate the string "nvidia-nat[all]==${NAT_VERSION}" in
nat_agent_build_requirements.md and replace the extras token "[all]" with
"[most]".

Comment on lines +155 to +167
if generate_ignore:
ignore_file = render_dockerignore(context_dir)

return docker_build(
context_dir=context_dir,
dockerfile=tmp_dockerfile,
tag=tag,
build_args=build_args,
)
finally:
tmp_dockerfile.unlink(missing_ok=True)
if ignore_file is not None:
ignore_file.unlink(missing_ok=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Generated .dockerignore handling can delete an existing user file.

If the file already exists, it is overwritten and then removed in finally.

Suggested fix
     tmp_dockerfile = context_dir / "Dockerfile.generated"
     ignore_file: Path | None = None
+    had_ignore = (context_dir / ".dockerignore").exists()
     try:
         tmp_dockerfile.write_text(content, encoding="utf-8")
 
-        if generate_ignore:
+        if generate_ignore and not had_ignore:
             ignore_file = render_dockerignore(context_dir)
@@
     finally:
         tmp_dockerfile.unlink(missing_ok=True)
         if ignore_file is not None:
             ignore_file.unlink(missing_ok=True)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/src/nemo_agents_plugin/container/builder.py` around lines
155 - 167, The current flow can remove a user's existing .dockerignore because
render_dockerignore always returns a path that is unlinked in the finally block;
change the logic so you only unlink the .dockerignore if we created it. Before
calling render_dockerignore (in the block guarded by generate_ignore) record
whether a .dockerignore already existed (e.g. check Path(context_dir /
".dockerignore").exists()) or change render_dockerignore to return a (path,
created_flag); then only call ignore_file.unlink(missing_ok=True) in the finally
when the created_flag is True (or when the pre-check showed the file did not
previously exist). Keep tmp_dockerfile cleanup unchanged.

Comment on lines +266 to +272
if has_pyproject:
assert pyproject is not None
try:
relative_config = agent_config.resolve().relative_to(pyproject.resolve().parent)
except ValueError:
relative_config = Path(agent_config.name)
config_file_path = f"/workspace/{relative_config.as_posix()}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when agent_config is outside the project root.

Current fallback silently points to a file that will not be in the Docker build context.

Suggested fix
     if has_pyproject:
         assert pyproject is not None
         try:
             relative_config = agent_config.resolve().relative_to(pyproject.resolve().parent)
         except ValueError:
-            relative_config = Path(agent_config.name)
+            raise ValueError(
+                f"agent_config '{agent_config}' must be inside project root '{pyproject.resolve().parent}' "
+                "when --pyproject is provided."
+            )
         config_file_path = f"/workspace/{relative_config.as_posix()}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/src/nemo_agents_plugin/container/template.py` around
lines 266 - 272, The current code in template.py silently falls back to a
non-build-context path when agent_config is outside the project root (in the
has_pyproject branch); change the except ValueError handler to fail fast by
raising a clear error (e.g., raise ValueError or RuntimeError) that includes
agent_config and pyproject paths rather than assigning Path(agent_config.name)
to relative_config, so that config_file_path is not set to a bogus
"/workspace/..." value; update the try/except around
agent_config.resolve().relative_to(pyproject.resolve().parent) accordingly
(references: has_pyproject, pyproject, agent_config, relative_config,
config_file_path).

Comment on lines +47 to +50
try:
import yaml
except ImportError:
return ValidationResult(valid=True, errors=[])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Inspect the validator file around the referenced lines
sed -n '1,140p' plugins/nemo-agents/src/nemo_agents_plugin/container/validator.py | nl -ba | sed -n '1,140p'
echo "----"
sed -n '140,260p' plugins/nemo-agents/src/nemo_agents_plugin/container/validator.py | nl -ba | sed -n '140,260p'

# 2) Search for other yaml usage / related validation paths
rg -n "import yaml|yaml\." plugins/nemo-agents/src/nemo_agents_plugin/container/validator.py plugins/nemo-agents/src/nemo_agents_plugin -S || true

# 3) Check packaging metadata for the PyYAML extra named in the comment
# Look for pyproject/requirements and "nemo-agents-plugin[container]" / "container" extra / PyYAML mention
fd -t f "pyproject.toml" -a . | head -n 20
fd -t f "requirements*.txt" -a . | head -n 20
fd -t f "setup.cfg" -a . | head -n 20
fd -t f "setup.py" -a . | head -n 20

# Grep pyproject/metadata for PyYAML and container extra
rg -n "PyYAML|pyyaml|nemo-agents-plugin\[container\]|extra.*container|\[container\]" -S pyproject.toml setup.cfg setup.py requirements*.txt **/pyproject.toml **/setup.cfg **/setup.py 2>/dev/null || true

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 112


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Inspect the validator file around the referenced lines (47-50)
sed -n '35,80p' plugins/nemo-agents/src/nemo_agents_plugin/container/validator.py

echo "----"
# Show any nearby context where yaml is imported/used
sed -n '1,120p' plugins/nemo-agents/src/nemo_agents_plugin/container/validator.py

# 2) Search for yaml usage in the container package
rg -n "import yaml|yaml\." plugins/nemo-agents/src/nemo_agents_plugin/container -S || true

# 3) Check packaging metadata for PyYAML and the container extra
# (best-effort: scan common packaging files)
for f in pyproject.toml setup.cfg setup.py requirements*.txt; do
  if [ -f "$f" ]; then echo "---- $f"; fi
done

rg -n "PyYAML|pyyaml|nemo-agents-plugin\[container\]|\[container\]" -S pyproject.toml setup.cfg setup.py requirements*.txt 2>/dev/null || true
rg -n "PyYAML|pyyaml" -S . 2>/dev/null | head -n 50 || true

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 50381


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect pyproject around the pyyaml line mentioned in prior output
sed -n '1,120p' pyproject.toml

echo "----"
# Narrow search for extras and container
rg -n "extras|\\[tool\\.uv\\]|\\[project\\]|\\[project\\.optional-dependencies\\]|\\[container\\]|nemo-agents-plugin\\[container\\]" pyproject.toml

echo "----"
# Find any pyyaml references near extras/container
rg -n "pyyaml|PyYAML|pyyaml" pyproject.toml

echo "----"
# Inspect the container plugin's metadata.py ImportError handling
sed -n '80,200p' plugins/nemo-agents/src/nemo_agents_plugin/container/metadata.py

echo "----"
# Check if validator/metadata share the same ImportError logic
rg -n "except ImportError:" plugins/nemo-agents/src/nemo_agents_plugin/container -S

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 7671


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate nemo-agents subproject packaging metadata
fd -t f "pyproject.toml" plugins/nemo-agents -a . | head -n 20 || true
fd -t f "setup.cfg" plugins/nemo-agents -a . | head -n 20 || true
fd -t f "setup.py" plugins/nemo-agents -a . | head -n 20 || true

# Inspect pyproject if present
if [ -f "plugins/nemo-agents/pyproject.toml" ]; then
  sed -n '1,200p' plugins/nemo-agents/pyproject.toml
fi

# Search for pyyaml and container extras in nemo-agents plugin packaging files
rg -n "pyyaml|PyYAML|\[container\]|nemo-agents-plugin\\[container\\]|optional-dependencies|dependency-groups" \
  plugins/nemo-agents -S || true

# Also check if validator.py is used from build scripts expecting container extra deps
rg -n "validate_agent_config|validator\\.py|container\\.validator" plugins/nemo-agents -S || true

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 8772


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search unit tests for yaml/pyyaml/ImportError expectations
rg -n "validate_agent_config|pyyaml|PyYAML|ImportError|yaml" plugins/nemo-agents/tests/unit/test_container.py

# Print the relevant sections around where validate_agent_config is tested (keep output small)
# We'll grab around the first few occurrences and around the later blocks.
for lineno in 520 535 540 555 575 590 610 630 650 960 975 980 1000; do
  echo "---- around $lineno"
  sed -n "$((lineno-15)),$((lineno+25))p" plugins/nemo-agents/tests/unit/test_container.py || true
done

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 24950


Fail validation when PyYAML is missing (don’t return success).

validate_agent_config() returns ValidationResult(valid=True, errors=[]) on ImportError for yaml, bypassing all YAML/structural checks (plugins/nemo-agents/src/nemo_agents_plugin/container/validator.py:47-50). Even though nemo-agents-plugin declares pyyaml as a dependency, this makes broken environments look like valid configs.

Suggested fix
     try:
         import yaml
     except ImportError:
-        return ValidationResult(valid=True, errors=[])
+        return ValidationResult(
+            valid=False,
+            errors=["PyYAML is required for agent config validation. Install nemo-agents-plugin[container]."],
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/src/nemo_agents_plugin/container/validator.py` around
lines 47 - 50, The current try/except in validate_agent_config silently treats
missing PyYAML as valid; change the ImportError handling in
validate_agent_config to return a ValidationResult with valid=False and an
explanatory error (e.g., "PyYAML not installed: <error>") instead of
ValidationResult(valid=True,...), so callers know YAML parsing/validation
couldn't run; keep using the ValidationResult class and include the ImportError
message in the errors list to aid debugging.

Comment on lines +70 to +74
wf_type = workflow.get("_type", "")
if wf_type and wf_type not in _KNOWN_WORKFLOW_TYPES:
errors.append(
f"Unknown workflow type '{wf_type}'. Expected one of: {', '.join(sorted(_KNOWN_WORKFLOW_TYPES))}."
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Require workflow._type; don’t allow it to be omitted.

Configs with workflow but no _type currently pass validation.

Suggested fix
     wf_type = workflow.get("_type", "")
-    if wf_type and wf_type not in _KNOWN_WORKFLOW_TYPES:
+    if not isinstance(wf_type, str) or not wf_type:
+        errors.append("Missing required key: workflow._type.")
+    elif wf_type not in _KNOWN_WORKFLOW_TYPES:
         errors.append(
             f"Unknown workflow type '{wf_type}'. Expected one of: {', '.join(sorted(_KNOWN_WORKFLOW_TYPES))}."
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
wf_type = workflow.get("_type", "")
if wf_type and wf_type not in _KNOWN_WORKFLOW_TYPES:
errors.append(
f"Unknown workflow type '{wf_type}'. Expected one of: {', '.join(sorted(_KNOWN_WORKFLOW_TYPES))}."
)
wf_type = workflow.get("_type", "")
if not isinstance(wf_type, str) or not wf_type:
errors.append("Missing required key: workflow._type.")
elif wf_type not in _KNOWN_WORKFLOW_TYPES:
errors.append(
f"Unknown workflow type '{wf_type}'. Expected one of: {', '.join(sorted(_KNOWN_WORKFLOW_TYPES))}."
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/src/nemo_agents_plugin/container/validator.py` around
lines 70 - 74, The validator currently lets configs with a workflow but no
"_type" pass; update the check in validator.py to treat a missing or empty
workflow["_type"] as an error: read wf_type = workflow.get("_type", "") and if
not wf_type append an error like "Missing required workflow._type. Expected one
of: ..." (use _KNOWN_WORKFLOW_TYPES to list allowed values), otherwise keep the
existing unknown-type check that appends to errors when wf_type not in
_KNOWN_WORKFLOW_TYPES.

mock_push.assert_not_called()

def test_no_build_project_mode_writes_dockerfile_next_to_pyproject(
self, package_cli, project_dir: "tuple[Path, Path]"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove quotes from type hint.

String-based type hint violates guideline. tuple and Path are imported, use concrete hint.

🔧 Fix
-    self, package_cli, project_dir: "tuple[Path, Path]"
+    self, package_cli, project_dir: tuple[Path, Path]

As per coding guidelines: Always prefer concrete type hints over string based ones.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
self, package_cli, project_dir: "tuple[Path, Path]"
self, package_cli, project_dir: tuple[Path, Path]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/tests/unit/test_container.py` at line 1109, The test
method's parameter type hint is written as a string ("tuple[Path, Path]");
remove the quotes so the annotation is a concrete type tuple[Path, Path] (e.g.,
change the signature that contains package_cli, project_dir: "tuple[Path, Path]"
to use package_cli, project_dir: tuple[Path, Path]) to comply with concrete
type-hint guidelines and ensure tuple and Path are used directly.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

Introduces a render → validate → build → publish pipeline for packaging
NAT agents as OCI container images, driven by flags on a single command.

- New `container/` module: Jinja2 Dockerfile renderer (template.py),
  python-on-whales builder (builder.py) and publisher (publisher.py),
  structural config validator (validator.py), OCI metadata extractor
  (metadata.py), and a sample sandbox policy (openshell_policy.yaml).
- CLI: registers `nemo agents package` with `--no-build`, `--publish`,
  `--registry`, `--format docker|whl`, `--platform`, plus reproducibility
  knobs (`--nat-version`, `--base-image-*`, `--python-version`) and
  security defaults (`--allow-root` opt-out, generated `.dockerignore`).
- pyproject: declares `[container]` optional extra (jinja2, python-on-whales)
  and adds jinja2 to the test extra.
- Tests: 84 unit tests covering render, build, publish, validate, metadata,
  and CLI flag-combination handling.
- Example: minimal react_agent workflow (`examples/hello_world.yaml`).

Signed-off-by: Arpit Singh (SW-CLOUD) <arpsingh@nvidia.com>
@arpitsardhana arpitsardhana force-pushed the aircore-4-package-plugin/arpsingh branch from da18709 to a49559b Compare May 28, 2026 23:51
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
plugins/nemo-agents/src/nemo_agents_plugin/cli.py (2)

374-390: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

--format whl is accepted but still runs the Docker build path.

After validation, execution always calls build_agent_image(...) and does not branch to a wheel flow, so --format whl behavior is incorrect.

Also applies to: 438-444

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/src/nemo_agents_plugin/cli.py` around lines 374 - 390,
The CLI always calls build_agent_image(...) regardless of the --format value;
change the post-validation branch to check the format flag (e.g., format or
build_format variable) and route to the correct flow: if format == "whl" invoke
the wheel packaging flow (call the existing wheel packaging function or add a
new package_agent_wheel/build_agent_whl function that accepts the same relevant
args such as agent, pyproject, tag/agent_version, python_version, template_path,
generate_ignore, etc.), otherwise call build_agent_image(...). Apply the same
fix to the other occurrence that mirrors this call so --format whl actually
triggers the wheel flow.

275-279: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

--platform is still a no-op and can report false publish success.

platform is never wired into the build call, and the multi-platform branch exits with a success message without performing an actual push in this path.

Also applies to: 400-404

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/src/nemo_agents_plugin/cli.py` around lines 275 - 279,
The --platform Option defined as platform: Optional[list[str]] (typer Option) is
never forwarded into the image build/publish flow and the multi-platform branch
returns success without performing the push; update the publish/build flow to
pass the platform list into the build invocation (e.g., call build_image/build
or whatever function constructs the container with a platforms/platform
argument) and ensure the multi-platform branch actually performs the
push/publish (invoke the same push logic or docker buildx/push call) before
emitting success; adjust any conditional that prints success to run only after
push completes and handle empty/None platform by using the default linux/amd64 +
linux/arm64 behavior.
🧹 Nitpick comments (2)
plugins/nemo-agents/README.md (2)

255-397: ⚡ Quick win

Add Python SDK alternatives alongside CLI examples via tab sets.

Packaging examples are CLI-only. Add parallel Python SDK examples in tabs for the same tasks (render-only, build, build+publish).

As per coding guidelines, "Provide both Python SDK and CLI examples in tab-sets for consistency and to support multiple user workflows."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/README.md` around lines 255 - 397, Update the "package —
render, build, and publish in one command" section to present CLI and Python SDK
examples side-by-side using tab-sets: keep the existing CLI code blocks for the
three scenarios (render-only, build, build+publish) and add corresponding Python
SDK snippets that demonstrate the SDK call(s) to render/write Dockerfile, build
an image with a tag, and build+publish to a registry (match the same flags:
--no-build, --tag, --publish/--registry equivalents); mirror the "**With an
existing Dockerfile**" and "**Project mode**" examples with SDK alternatives as
well, and ensure the "Full example — inspect, build, publish" sequence also has
parallel CLI and Python tabs so users can choose either workflow.

231-490: 🏗️ Heavy lift

Split packaging docs into one Diataxis quadrant per page.

This section mixes HOW-TO walkthroughs with REFERENCE tables in one page segment. Move flag tables/spec details to a separate reference page and keep this section task-focused, cross-linking between pages.

As per coding guidelines, "Each documentation page should fit ONE Diataxis quadrant; do not mix tutorials with reference tables or how-tos with architecture explanations; use cross-links instead."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/README.md` around lines 231 - 490, This page mixes
HOW-TO/tutorial content with reference/spec tables (Diataxis rule); split the
current "Packaging command — containerize agents as Docker images" into two
pages: a task-focused HOW-TO page that keeps the progressive pipeline,
render/build/publish examples, full example, and project mode usage (sections
containing step-by-step commands and walkthroughs), and a separate reference
page that contains the Flag reference, Build options, Hardening overrides, OCI
labels, Image tagging convention, Agent config validation, Security defaults,
and Rendering modes tables; update cross-links from the HOW-TO to the new
reference page and adjust the documentation index/TOC accordingly, ensuring
headings like "Packaging command", "Flag reference", "Image tagging convention",
"OCI image labels", "Agent config validation", and "Rendering modes" are moved
intact to the reference file so callers (readers) can find exact flag names and
label keys.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plugins/nemo-agents/tests/unit/test_container.py`:
- Around line 507-513: Replace realistic token prefixes in the test string
literals with non-signature placeholders to avoid secret scanning: change keys
like "https://user:glpat-xxx@gitlab-master.nvidia.com/org/repo.git",
"https://glpat-AAAA@gitlab-master.nvidia.com/org/repo.git", and
"https://oauth2:ghp_xxx@github.com/org/repo.git" to use neutral placeholders
such as "https://USER_X:TOKEN_X@gitlab-master.nvidia.com/org/repo.git",
"https://USER_X:TOKEN_X@gitlab-master.nvidia.com/org/repo.git", and
"https://USER_X:TOKEN_X@github.com/org/repo.git" (and make equivalent changes
for the similar entries referenced at the subsequent 519-523 region) so the
tests keep coverage but no realistic token prefixes remain.

---

Duplicate comments:
In `@plugins/nemo-agents/src/nemo_agents_plugin/cli.py`:
- Around line 374-390: The CLI always calls build_agent_image(...) regardless of
the --format value; change the post-validation branch to check the format flag
(e.g., format or build_format variable) and route to the correct flow: if format
== "whl" invoke the wheel packaging flow (call the existing wheel packaging
function or add a new package_agent_wheel/build_agent_whl function that accepts
the same relevant args such as agent, pyproject, tag/agent_version,
python_version, template_path, generate_ignore, etc.), otherwise call
build_agent_image(...). Apply the same fix to the other occurrence that mirrors
this call so --format whl actually triggers the wheel flow.
- Around line 275-279: The --platform Option defined as platform:
Optional[list[str]] (typer Option) is never forwarded into the image
build/publish flow and the multi-platform branch returns success without
performing the push; update the publish/build flow to pass the platform list
into the build invocation (e.g., call build_image/build or whatever function
constructs the container with a platforms/platform argument) and ensure the
multi-platform branch actually performs the push/publish (invoke the same push
logic or docker buildx/push call) before emitting success; adjust any
conditional that prints success to run only after push completes and handle
empty/None platform by using the default linux/amd64 + linux/arm64 behavior.

---

Nitpick comments:
In `@plugins/nemo-agents/README.md`:
- Around line 255-397: Update the "package — render, build, and publish in one
command" section to present CLI and Python SDK examples side-by-side using
tab-sets: keep the existing CLI code blocks for the three scenarios
(render-only, build, build+publish) and add corresponding Python SDK snippets
that demonstrate the SDK call(s) to render/write Dockerfile, build an image with
a tag, and build+publish to a registry (match the same flags: --no-build, --tag,
--publish/--registry equivalents); mirror the "**With an existing Dockerfile**"
and "**Project mode**" examples with SDK alternatives as well, and ensure the
"Full example — inspect, build, publish" sequence also has parallel CLI and
Python tabs so users can choose either workflow.
- Around line 231-490: This page mixes HOW-TO/tutorial content with
reference/spec tables (Diataxis rule); split the current "Packaging command —
containerize agents as Docker images" into two pages: a task-focused HOW-TO page
that keeps the progressive pipeline, render/build/publish examples, full
example, and project mode usage (sections containing step-by-step commands and
walkthroughs), and a separate reference page that contains the Flag reference,
Build options, Hardening overrides, OCI labels, Image tagging convention, Agent
config validation, Security defaults, and Rendering modes tables; update
cross-links from the HOW-TO to the new reference page and adjust the
documentation index/TOC accordingly, ensuring headings like "Packaging command",
"Flag reference", "Image tagging convention", "OCI image labels", "Agent config
validation", and "Rendering modes" are moved intact to the reference file so
callers (readers) can find exact flag names and label keys.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 83ab4e0e-e2af-4d88-88be-30943d05a5ad

📥 Commits

Reviewing files that changed from the base of the PR and between da18709 and a49559b.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • plugins/nemo-agents/README.md
  • plugins/nemo-agents/examples/.dockerignore
  • plugins/nemo-agents/examples/Dockerfile
  • plugins/nemo-agents/examples/hello_world.yaml
  • plugins/nemo-agents/pyproject.toml
  • plugins/nemo-agents/src/nemo_agents_plugin/cli.py
  • plugins/nemo-agents/src/nemo_agents_plugin/container/builder.py
  • plugins/nemo-agents/src/nemo_agents_plugin/container/metadata.py
  • plugins/nemo-agents/src/nemo_agents_plugin/container/publisher.py
  • plugins/nemo-agents/src/nemo_agents_plugin/container/template.py
  • plugins/nemo-agents/src/nemo_agents_plugin/container/validator.py
  • plugins/nemo-agents/tests/unit/test_container.py
✅ Files skipped from review due to trivial changes (2)
  • plugins/nemo-agents/examples/hello_world.yaml
  • plugins/nemo-agents/examples/.dockerignore
🚧 Files skipped from review as they are similar to previous changes (6)
  • plugins/nemo-agents/pyproject.toml
  • plugins/nemo-agents/examples/Dockerfile
  • plugins/nemo-agents/src/nemo_agents_plugin/container/template.py
  • plugins/nemo-agents/src/nemo_agents_plugin/container/validator.py
  • plugins/nemo-agents/src/nemo_agents_plugin/container/builder.py
  • plugins/nemo-agents/src/nemo_agents_plugin/container/publisher.py

Comment on lines +507 to +513
"https://user:glpat-xxx@gitlab-master.nvidia.com/org/repo.git": "https://gitlab-master.nvidia.com/org/repo.git",
"https://glpat-AAAA@gitlab-master.nvidia.com/org/repo.git": "https://gitlab-master.nvidia.com/org/repo.git",
"https://oauth2:ghp_xxx@github.com/org/repo.git": "https://github.com/org/repo.git",
"https://github.com/org/repo.git": "https://github.com/org/repo.git",
"git@gitlab-master.nvidia.com:aire/microservices/nmp.git": "git@gitlab-master.nvidia.com:aire/microservices/nmp.git",
"ssh://git@gitlab-master.nvidia.com:12051/aire/microservices/nmp.git": "ssh://git@gitlab-master.nvidia.com:12051/aire/microservices/nmp.git",
"": "",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Test token literals are tripping secret scanning.

Use non-signature placeholders (e.g., TOKEN_X, USER_X) instead of realistic token prefixes (glpat-, ghp_) to keep coverage without CI/security noise.

Also applies to: 519-523

🧰 Tools
🪛 GitHub Check: Secrets Scan

[warning] 509-509:
Found unverified URI result 🐷🔑


[warning] 507-507:
Found unverified URI result 🐷🔑

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/tests/unit/test_container.py` around lines 507 - 513,
Replace realistic token prefixes in the test string literals with non-signature
placeholders to avoid secret scanning: change keys like
"https://user:glpat-xxx@gitlab-master.nvidia.com/org/repo.git",
"https://glpat-AAAA@gitlab-master.nvidia.com/org/repo.git", and
"https://oauth2:ghp_xxx@github.com/org/repo.git" to use neutral placeholders
such as "https://USER_X:TOKEN_X@gitlab-master.nvidia.com/org/repo.git",
"https://USER_X:TOKEN_X@gitlab-master.nvidia.com/org/repo.git", and
"https://USER_X:TOKEN_X@github.com/org/repo.git" (and make equivalent changes
for the similar entries referenced at the subsequent 519-523 region) so the
tests keep coverage but no realistic token prefixes remain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants