fix(security): address high+ CodeQL alerts across plugins, services, SDK by marcusds · Pull Request #91 · NVIDIA-NeMo/nemo-platform

marcusds · 2026-05-28T18:13:17Z

Summary

Addresses 28 of 66 open high+ CodeQL findings. Remaining 38 py/clear-text-logging-sensitive-data alerts are confirmed false positives (they log secret references like workspace/name or already-masked values, never secret material) and are left open for triage in the GitHub Security tab.

Alerts addressed

Severity	Rule	#	Problem	How we addressed it
Critical	`py/partial-ssrf`	1	`nemo-agents` proxy concatenated `endpoint + "/" + trailing_uri`, where `trailing_uri` is a user-controlled FastAPI path param. A value like `//evil.com/x` or `http://evil.com/x` could shift the request to a different host (SSRF).	Switched the join to `urljoin(endpoint + "/", trailing_uri)`, then parsed the result and rejected with HTTP 400 if `scheme` or `netloc` no longer matches the trusted deployment endpoint. Added a parametrized unit test (`test_cross_origin_trailing_uri_rejected`) that drives `_proxy` directly with hostile inputs (`starlette` normalizes the URL before route handlers see it, so the test calls the proxy function rather than the route).
High	`py/tarslip`	4	Anonymizer / evaluator / data-designer SDKs unpacked job artifact tarballs with `tar.extractall(path=...)`. The anonymizer and evaluator validated members in a separate loop before extracting, but CodeQL can't trace per-member validation to a bulk `extractall`; the data-designer SDK had no validation at all.	Refactored all sites to extract member-by-member: validate each member's resolved path stays under `output_path`, reject `issym()`/`islnk()` and non-file/non-dir types, then call `tar.extract(member, path=output_path)`. Added a `_safe_extract_tar` helper to the data-designer SDK (sync + async paths) since it had no validation.
High	`py/jinja2/autoescape-false`	3	`services/core/models` schema validator, inference-gateway auth-header renderer, and the SDK CLI generator all instantiated `jinja2.Environment()` without an `autoescape` argument, which defaults to `False`. CodeQL flags this even when output is not HTML.	Imported `select_autoescape` and passed `autoescape=select_autoescape()` at each `Environment(...)` call. For all three sites the runtime value is still `False` (no HTML filenames involved), so rendering of HTTP header values / generated Python source is unchanged, but the call signals deliberate consideration to scanners.
High	`py/weak-sensitive-data-hashing`	2	`endpoint_identity()` in the evaluator SDK resilience module passed an API token (`auth_identity`) through `hashlib.sha256(...).hexdigest()[:16]` to build a stable dict key for scheduler state. CodeQL flagged this as password hashing (variable-name heuristic) regardless of `usedforsecurity=False`, because its "passwords need slow hashing" sub-rule does not respect that flag.	This is bookkeeping, not credential storage — there is no comparison-against-stored-hash. Swapped the SHA-256 truncation for `zlib.crc32(...)` formatted as 8 hex chars. CRC32 is not cryptographic, so the password-hashing rule cannot reach it; identity behaviour (stable, unique per token) is preserved. Mirrored the change to the SDK-vendored copy.
High	`py/incomplete-url-substring-sanitization`	18	Substring `"nvcr.io" in image_lower` / `"github.com" in url` checks could be spoofed by paths or repo names containing the literal host (e.g. `https://github.com.evil.example.com`, an image tag with `nvcr.io` in its repo path). 9 production sites and 9 test-assertion sites.	Production code: replaced `"nvcr.io" in registry` with `registry == "nvcr.io"` once the registry host was already extracted; added an `_image_registry_host_from_ref()` helper to the quickstart prompts so we compare an extracted host equality; replaced the agents' forge detection (`"github.com" in url`) with a new `_git_remote_host()` helper that handles both HTTPS (`urlparse`) and SSH (`user@host:path`) git URL forms. Test assertions: switched to `urlparse(call_args.kwargs["url"]).hostname == "expected.host"`. Mirrored every quickstart change in the SDK-vendored copy.
High	`py/clear-text-logging-sensitive-data`	38	CodeQL flagged 38 logging sites where variable names matching `api_key`, `secret`, `auth` appear in the format string.	Not fixed in this PR. All 38 sites were audited: every one logs only an identifier (`workspace/name`, provider name, env-var name) or a pre-masked value — none log secret material. Recommend bulk-dismissing in the Security tab with reason "false positive: only secret reference logged, not value".

Test plan

Summary by CodeRabbit

New Features
- Added helpers for extracting registry and git remote hosts and a shared safe tar extraction utility; auth fingerprint generation now uses a different digest.
Bug Fixes
- Registry detection and credential selection use exact host matching; routing and URL checks now compare parsed hostnames.
Security Enhancements
- Hardened proxy target validation, safer archive extraction, worker-origin checks, and preserved raw header rendering.
Tests
- Added/updated tests for strict proxy, host-matching, and registry detection.

Fixes 28 of 66 open high+ CodeQL findings: - py/partial-ssrf (critical): nemo-agents gateway proxy validates that the joined target URL stays on the deployment's origin before forwarding. - py/tarslip (high, 4): anonymizer, evaluator, and data-designer SDKs now extract tar members one at a time after path/link/type validation. - py/jinja2/autoescape-false (high, 3): models schemas, inference-gateway proxy, and CLI generator pass autoescape=select_autoescape() so scanners see deliberate intent for non-HTML templates. - py/weak-sensitive-data-hashing (high, 2): evaluator SDK resilience hash passes usedforsecurity=False (identity key, not credential storage). - py/incomplete-url-substring-sanitization (high, 18): container/quickstart registry checks compare hostnames exactly; agents improvement loop and inference-gateway test assertions use urlparse(...).hostname; agents gateway test exercises the new SSRF guard with parametrized hostile URIs. The 38 py/clear-text-logging-sensitive-data alerts are false positives - they log secret references (workspace/name) or already-masked values, never secret material - and are left open for triage in the GitHub UI. Signed-off-by: mschwab <mschwab@nvidia.com>

github-actions · 2026-05-28T18:26:11Z

Suite	Lines Covered	Line Rate	Branch Rate
Unit Tests	18293/24245	75.4%	61.9%
Integration Tests	11693/23022	50.8%	26.0%

CodeQL still flagged the sha256 hash even with usedforsecurity=False — the 'password hashing must be slow' sub-rule fires on the auth_identity name heuristic regardless. The hash here is a scheduler bookkeeping identifier, not credential storage, so switch to a non-cryptographic crc32 fingerprint that the password-hashing rule cannot reach. Signed-off-by: mschwab <mschwab@nvidia.com>

coderabbitai · 2026-05-28T18:48:12Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Structured host/URL parsing replaces substring checks; per-member tar extraction with path/link validation added; Jinja2 environments set autoescape=False; worker message origin validated; endpoint auth fingerprinting uses blake2b (8-byte hex).

Changes

Security Hardening and Validation Improvements

Layer / File(s)	Summary
Registry host parsing and NGC detection `packages/nemo_platform_ext/.../_registry.py`, `.../prompts.py`, `.../config.py`, `.../container.py`	Adds `image_registry_host` and `_image_registry_host_from_ref`; auth detection and NVCR/NGC checks now compare normalized registry host (strip port, lowercase) for exact `nvcr.io` instead of substring matching.
Gateway proxy target construction & redirect guard `plugins/nemo-agents/src/.../gateway.py`, `plugins/nemo-agents/tests/unit/test_gateway.py`	Builds upstream target with `urljoin`/`urlparse`, validates scheme/netloc against configured endpoint (400 on mismatch), and sets `httpx.AsyncClient(follow_redirects=False)`. New test verifies malicious trailing_uri variants are rejected.
Git remote host normalization and forge detection `packages/nemo_platform_plugin/src/.../git_url.py`, `plugins/nemo-agents/src/.../improvement/loop.py`, `packages/nemo_platform_plugin/tests/test_git_url.py`	Adds `git_remote_host` to extract host from both `://` and `user@host:` remotes and maps normalized hosts to forge identifiers (`github`, `gitlab`).
Test assertions switched to structured URL parsing `services/core/.../tests/`, `services/guardrails/tests/`, `tests/agentic-use/*`	Replaces substring-based URL checks with `urlparse(...).hostname`/`path` assertions; adds `urlparse` imports where needed.
Secure per-member tar extraction `packages/nemo_platform_plugin/.../jobs/archive.py`, `plugins/nemo-data-designer/.../job_resources.py`, `plugins/nemo-anonymizer/.../job_resources.py`, `plugins/nemo-evaluator/.../job_resources.py`	Adds `safe_extract_tar` that validates member target paths, disallows symlinks/hardlinks/special members, and extracts members individually; replaces `tar.extractall` usages with delegated safe extraction.
Jinja2 environments: autoescape explicitly disabled `services/core/inference-gateway/.../proxy.py`, `services/core/models/.../schemas.py`, `tools/nemo-platform-sdk-tools/.../generator.py`	Jinja2 Environment instances used for auth header rendering/validation and CLI generation set `autoescape=False` to preserve raw header and code characters.
Web worker message origin guard `web/packages/studio/src/workers/LargeFileWorker.ts`	Adds early origin check in `self.onmessage` to ignore messages with `e.origin` that does not match `self.location.origin`.
Endpoint auth fingerprinting using blake2b `packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resilience/classifier.py`	`endpoint_identity` now computes `auth_fingerprint` via `hashlib.blake2b(..., digest_size=8).hexdigest()` when `auth_identity` is present; endpoint key format unchanged.

Suggested reviewers

dmariali

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 68.57% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	Title clearly identifies the main change: addressing CodeQL security alerts. It accurately describes the primary focus of this multi-component security fix PR.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix-codeql-issues/mschwab

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

plugins/nemo-anonymizer/src/nemo_anonymizer_plugin/sdk/job_resources.py (1)

47-58: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix _safe_extract_tar to avoid partial extraction on failure

_safe_extract_tar() currently calls tar.extract(...) inside the validation loop, so if a later tar member is rejected, earlier valid members may already have been written to output_path, leaving partial artifacts. Use a second pass: validate all members first, then extract.

Extend plugins/nemo-anonymizer/tests/unit/test_sdk_job_resources.py with an assertion that output_path remains empty/no files are written when _safe_extract_tar(...) raises.

Suggested fix

 def _safe_extract_tar(tar: tarfile.TarFile, output_path: Path) -> None:
     output_path.mkdir(parents=True, exist_ok=True)
     base_path = output_path.resolve()
-    for member in tar.getmembers():
+    members = tar.getmembers()
+    for member in members:
         target_path = (output_path / member.name).resolve()
         if target_path != base_path and base_path not in target_path.parents:
             raise AnonymizerJobError(f"Refusing to extract unsafe tar member: {member.name}")
         if member.issym() or member.islnk():
             raise AnonymizerJobError(f"Refusing to extract tar link member: {member.name}")
         if not (member.isfile() or member.isdir()):
             raise AnonymizerJobError(f"Refusing to extract special tar member: {member.name}")
+    for member in members:
         tar.extract(member, path=output_path)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-anonymizer/src/nemo_anonymizer_plugin/sdk/job_resources.py`
around lines 47 - 58, The _safe_extract_tar function currently extracts members
as it validates, causing partial extraction if a later member fails; change
_safe_extract_tar to first iterate tar.getmembers() and validate every member
(checking path safety against output_path.resolve(), rejecting symlinks/links,
and allowing only file/dir types using member.isfile()/member.isdir()) without
calling tar.extract, and only after all members pass validation perform a second
loop that calls tar.extract for each validated member; raise AnonymizerJobError
as before on any validation failure. Also update
plugins/nemo-anonymizer/tests/unit/test_sdk_job_resources.py to add a test that
constructs a tar with an unsafe member, calls _safe_extract_tar, expects
AnonymizerJobError, and asserts output_path remains empty (no files written).

🧹 Nitpick comments (2)

packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/prompts.py (1)
51-59: ⚡ Quick win

Consolidate registry-host parsing to a single helper.

This duplicates _registry_host_from_image (Line 310+). Keep one implementation to avoid divergence in auth decisions.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/prompts.py`
around lines 51 - 59, The function _image_registry_host_from_ref duplicates
logic already implemented in _registry_host_from_image; remove the duplicate and
consolidate calls to use the single helper (_registry_host_from_image) to avoid
divergence in auth decisions, updating any call sites that reference
_image_registry_host_from_ref to call _registry_host_from_image instead and
ensure the preserved helper retains the same behavior (handling empty strings,
"/" splits, and host detection via "." or ":").
plugins/nemo-agents/src/nemo_agents_plugin/api/v2/gateway.py (1)
160-164: ⚡ Quick win

Don’t block this today; add path-prefix validation as defense-in-depth

The path-escape risk from urljoin() would require AgentDeployment.endpoint to include a non-root path prefix. In this repo, AgentDeployment.endpoint is set by InMemoryRunnerBackend as http://127.0.0.1:{port} (no path), and the gateway unit tests use the same shape—so the scheme/netloc check is sufficient for current backends.

Add the path-prefix guard you proposed (and a unit test that calls _proxy() with an endpoint like http://localhost:9001/base/) if/when any backend can set an endpoint with a path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-agents/src/nemo_agents_plugin/api/v2/gateway.py` around lines
160 - 164, The current check only compares scheme/netloc; add a path-prefix
validation to ensure the computed target URL (target_parsed.path) does not
escape the endpoint's base path when AgentDeployment.endpoint may include a
non-root path; in the gateway module modify the function that builds/validates
proxy targets (the _proxy() code block that computes endpoint_parsed,
target_url, target_parsed) to verify that target_parsed.path starts with
endpoint_parsed.path (after normalizing trailing slashes) and raise
HTTPException(400) if it does not; also add a unit test that calls _proxy() (or
the public API that invokes it) with an endpoint containing a path prefix like
"http://localhost:9001/base/" and a trailing_uri that attempts to escape (e.g.,
"../other") to assert the request is rejected.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@plugins/nemo-data-designer/src/nemo_data_designer_plugin/sdk/job_resources.py`:
- Around line 61-72: The current _safe_extract_tar validates and extracts
members in one loop, so a later unsafe member can trigger an exception after
earlier members were already written; change _safe_extract_tar to first iterate
tar.getmembers() and validate every member (using the same checks on
member.name, member.issym()/islnk(), member.isfile()/isdir() and path
containment against output_path.resolve()) and only if all members pass, perform
a second loop to extract each member; also add a unit test in
plugins/nemo-data-designer/tests that creates a tar with a safe member first and
an unsafe member later, calls _safe_extract_tar and asserts that it raises and
that no files or directories from the safe member exist in the output directory.

In `@services/core/inference-gateway/src/nmp/core/inference_gateway/api/proxy.py`:
- Line 18: render_auth_header currently creates a Jinja environment with
autoescaping enabled (JinjaEnvironment(autoescape=select_autoescape())) which
HTML-escapes credentials and corrupts Authorization/API-key headers; change the
environment creation to disable autoescaping (e.g., use
Environment(autoescape=False) or equivalent) so .from_string(...).render(...)
returns raw secrets, and make the identical change in
_validate_auth_header_format in services/core/models (use the same no-autoescape
Environment there) so runtime validation and rendering behave consistently.

In `@tests/agentic-use/evaluator-llm-judge-cli-easy/tests/test_outputs.py`:
- Around line 111-114: The current validation assigns valid_url using an OR so
any URL with "inference/providers" in parsed_model_url.path is accepted
regardless of host; change the predicate to require a trusted host when the
provider-proxy path is used. Update the valid_url check (the variable using
parsed_model_url.path and parsed_model_url.hostname) to use a trusted-host
whitelist and require parsed_model_url.hostname be in that whitelist whenever
parsed_model_url.path contains "inference/providers" (i.e., use an AND for the
provider path case or always check hostname against the whitelist).

In `@tests/agentic-use/evaluator-llm-judge-cli/tests/test_outputs.py`:
- Around line 111-114: The current boolean 'valid_url' allows any host if
parsed_model_url.path contains "inference/providers"; update the check so the
"inference/providers" path is only accepted when the hostname is a trusted host
(e.g., parsed_model_url.hostname == "inference-api.nvidia.com" or part of your
trusted-host set). Concretely, modify the expression using parsed_model_url.path
and parsed_model_url.hostname so that the path clause is ANDed with a
trusted-host check rather than ORed independently with it.

---

Outside diff comments:
In `@plugins/nemo-anonymizer/src/nemo_anonymizer_plugin/sdk/job_resources.py`:
- Around line 47-58: The _safe_extract_tar function currently extracts members
as it validates, causing partial extraction if a later member fails; change
_safe_extract_tar to first iterate tar.getmembers() and validate every member
(checking path safety against output_path.resolve(), rejecting symlinks/links,
and allowing only file/dir types using member.isfile()/member.isdir()) without
calling tar.extract, and only after all members pass validation perform a second
loop that calls tar.extract for each validated member; raise AnonymizerJobError
as before on any validation failure. Also update
plugins/nemo-anonymizer/tests/unit/test_sdk_job_resources.py to add a test that
constructs a tar with an unsafe member, calls _safe_extract_tar, expects
AnonymizerJobError, and asserts output_path remains empty (no files written).

---

Nitpick comments:
In `@packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/prompts.py`:
- Around line 51-59: The function _image_registry_host_from_ref duplicates logic
already implemented in _registry_host_from_image; remove the duplicate and
consolidate calls to use the single helper (_registry_host_from_image) to avoid
divergence in auth decisions, updating any call sites that reference
_image_registry_host_from_ref to call _registry_host_from_image instead and
ensure the preserved helper retains the same behavior (handling empty strings,
"/" splits, and host detection via "." or ":").

In `@plugins/nemo-agents/src/nemo_agents_plugin/api/v2/gateway.py`:
- Around line 160-164: The current check only compares scheme/netloc; add a
path-prefix validation to ensure the computed target URL (target_parsed.path)
does not escape the endpoint's base path when AgentDeployment.endpoint may
include a non-root path; in the gateway module modify the function that
builds/validates proxy targets (the _proxy() code block that computes
endpoint_parsed, target_url, target_parsed) to verify that target_parsed.path
starts with endpoint_parsed.path (after normalizing trailing slashes) and raise
HTTPException(400) if it does not; also add a unit test that calls _proxy() (or
the public API that invokes it) with an endpoint containing a path prefix like
"http://localhost:9001/base/" and a trailing_uri that attempts to escape (e.g.,
"../other") to assert the request is rejected.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 66a80991-9764-4af1-a864-66e8b97292b9

📥 Commits

Reviewing files that changed from the base of the PR and between d3ad417 and e11a57a.

⛔ Files ignored due to path filters (4)

sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/resilience/classifier.py is excluded by !sdk/**
sdk/python/nemo-platform/src/nemo_platform/quickstart/config.py is excluded by !sdk/**
sdk/python/nemo-platform/src/nemo_platform/quickstart/container.py is excluded by !sdk/**
sdk/python/nemo-platform/src/nemo_platform/quickstart/prompts.py is excluded by !sdk/**

📒 Files selected for processing (19)

packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resilience/classifier.py
packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/config.py
packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/container.py
packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/prompts.py
plugins/nemo-agents/src/nemo_agents_plugin/api/v2/gateway.py
plugins/nemo-agents/src/nemo_agents_plugin/improvement/loop.py
plugins/nemo-agents/tests/unit/test_gateway.py
plugins/nemo-anonymizer/src/nemo_anonymizer_plugin/sdk/job_resources.py
plugins/nemo-data-designer/src/nemo_data_designer_plugin/sdk/job_resources.py
plugins/nemo-evaluator/src/nemo_evaluator/sdk/job_resources.py
services/core/inference-gateway/src/nmp/core/inference_gateway/api/proxy.py
services/core/inference-gateway/tests/unit/test_model_router.py
services/core/inference-gateway/tests/unit/test_openai_router.py
services/core/models/src/nmp/core/models/schemas.py
services/guardrails/tests/services/test_auth_token_handling.py
tests/agentic-use/evaluator-llm-judge-cli-easy/tests/test_outputs.py
tests/agentic-use/evaluator-llm-judge-cli/tests/test_outputs.py
tools/nemo-platform-sdk-tools/src/nemo_platform_sdk_tools/sdk/cli_generator/generator.py
web/packages/studio/src/workers/LargeFileWorker.ts

Multi-agent review (codex + roundtable) flagged real bugs in the prior security-hardening pass. Applied here: - jinja: select_autoescape() activates HTML escaping for from_string() (default_for_string=True), so API keys containing &/<>" got HTML-encoded and broke upstream auth. Reverted to Environment(autoescape=False) with comments and lint suppressions explaining non-HTML output. - endpoint_identity: zlib.crc32 had only 32 bits of fingerprint space, which risks scheduler-state crosstalk between tenants under birthday-collision pressure (~50% at 65k endpoints). Replaced with blake2b(digest_size=8) — 64 bits, matches the prior sha256[:16] resistance, and is not flagged by CodeQL's password-hashing rule. - _detect_forge: startswith("gitlab.") was too strict and silently dropped self-hosted instances like gitlab-master.nvidia.com. Broadened to also match 'gitlab' in the first label of the host. - Registry host comparisons: image refs with an explicit port (e.g. nvcr.io:443/...) failed the exact match and skipped NGC credential flow. Strip :port before comparing in is_ngc_registry, in the container pull paths, and in prompt_for_registry_credentials. - Gateway proxy: explicit follow_redirects=False on httpx.AsyncClient so the SSRF guard cannot be bypassed by a 302 from a compromised agent if the httpx default ever changes. - Tar extraction (anonymizer / evaluator / data-designer): restored validate-all-then-extract two-pass so a rejected archive never leaves partial files behind. Per-member tar.extract() satisfies CodeQL. - SSRF unit test: rewrote through FastAPI TestClient using URL-encoded trailing path (%2F%2Fevil.example.com/x, http:%2F%2Fevil.example.com/x) so it exercises the route end-to-end instead of pinning the private _proxy() signature. Both vectors return 400 with the expected detail. Mirrored the source-tree changes into the SDK-vendored copies under sdk/python/nemo-platform/ to keep them in sync until the next make vendor. Signed-off-by: mschwab <mschwab@nvidia.com>

coderabbitai · 2026-05-28T18:59:30Z

Actionable comments posted: 0

Three sets of near-identical helpers were introduced (or already existed) across the platform during the high+ CodeQL alert pass. Consolidate them: - safe_extract_tar (nemo_platform_plugin.jobs.archive) Replaces the three independent _safe_extract_tar implementations in the anonymizer / evaluator / data-designer plugin SDKs with a single helper in nemo_platform_plugin (which all plugins already depend on). The helper accepts an error_cls factory so callers keep their plugin-specific error types (AnonymizerJobError / ValueError / DataDesignerJobError) without duplicating the validation+extraction loop. - image_registry_host (nemo_platform_ext.quickstart._registry) Consolidates four copies of registry-host parsing inside the quickstart module: _image_registry_host (container.py), _image_registry_host_from_ref and _registry_host_from_image (both prompts.py), and the get_registry_host method on QuickstartConfig. Everything now delegates to the shared helper; the back-compat names remain as thin wrappers for callers and tests that import them. - git_remote_host (nemo_platform_plugin.git_url) Promotes the inline _git_remote_host helper from nemo-agents' improvement/loop.py to nemo_platform_plugin so future PR/MR or CI/CD code in other plugins can reuse it instead of forking a third copy. Mirrored the source-tree changes into the SDK-vendored copies under sdk/python/nemo-platform/ to keep them in sync until the next make vendor. No behaviour change; existing tests pass. Signed-off-by: mschwab <mschwab@nvidia.com>

CodeRabbit caught that the predicate previously allowed any hostname when the path contained 'inference/providers'. Require the hostname to match the local IGW (NMP_BASE_URL) for the provider-proxy branch; keep the NVIDIA hosted inference API as the alternate trusted host. Signed-off-by: mschwab <mschwab@nvidia.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/git_url.py`:
- Around line 19-20: git_remote_host() currently only handles SCP-style remotes
when an "@" is present, so URLs like "github.com:org/repo.git" are returning ""
— update the parsing to also accept SCP-style host:repo when there is no user@
by detecting a hostname followed by ":" (but exclude Windows absolute paths like
"C:\..." and normal filesystem paths that start with "/"); modify the logic in
git_remote_host() to split on ":" and take the left side when the left side
looks like a host (e.g., contains a dot or matches hostname patterns) and add
unit tests asserting git_remote_host("github.com:org/repo.git")== "github.com",
git_remote_host("gitlab.com:group/repo.git")== "gitlab.com", and
git_remote_host(r"C:\repo")== "" to prevent regressions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 13d19625-bae0-46a8-960c-af688b72f49b

📥 Commits

Reviewing files that changed from the base of the PR and between c41446b and 3e48cef.

⛔ Files ignored due to path filters (4)

sdk/python/nemo-platform/src/nemo_platform/quickstart/_registry.py is excluded by !sdk/**
sdk/python/nemo-platform/src/nemo_platform/quickstart/config.py is excluded by !sdk/**
sdk/python/nemo-platform/src/nemo_platform/quickstart/container.py is excluded by !sdk/**
sdk/python/nemo-platform/src/nemo_platform/quickstart/prompts.py is excluded by !sdk/**

📒 Files selected for processing (10)

packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/_registry.py
packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/config.py
packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/container.py
packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/prompts.py
packages/nemo_platform_plugin/src/nemo_platform_plugin/git_url.py
packages/nemo_platform_plugin/src/nemo_platform_plugin/jobs/archive.py
plugins/nemo-agents/src/nemo_agents_plugin/improvement/loop.py
plugins/nemo-anonymizer/src/nemo_anonymizer_plugin/sdk/job_resources.py
plugins/nemo-data-designer/src/nemo_data_designer_plugin/sdk/job_resources.py
plugins/nemo-evaluator/src/nemo_evaluator/sdk/job_resources.py

✅ Files skipped from review due to trivial changes (1)

packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/_registry.py

🚧 Files skipped from review as they are similar to previous changes (1)

packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/container.py

coderabbitai · 2026-05-28T19:28:41Z

Actionable comments posted: 0

CodeRabbit caught that git_remote_host() returned "" for SCP-style remotes like 'github.com:org/repo.git' because the parser required an '@' character. Broaden the alt-form parser to accept bare 'host:path' form too, while rejecting Windows drive letters (C:\\repo) by requiring a dot in the hostname. Added 14 parametrized unit tests. Signed-off-by: mschwab <mschwab@nvidia.com>

coderabbitai · 2026-05-28T19:34:01Z

Actionable comments posted: 0

Aligns import ordering in the SDK-vendored copies under sdk/python/nemo-platform/quickstart/ with what `make vendor` produces. Catches up after the registry-host helper consolidation. Signed-off-by: mschwab <mschwab@nvidia.com>

github-advanced-security AI found potential problems May 28, 2026

View reviewed changes

Comment thread packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resilience/classifier.py Fixed

Comment thread sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/resilience/classifier.py Fixed

marcusds marked this pull request as ready for review May 28, 2026 18:35

marcusds requested review from a team as code owners May 28, 2026 18:35

mckornfield approved these changes May 28, 2026

View reviewed changes

Comment thread plugins/nemo-data-designer/src/nemo_data_designer_plugin/sdk/job_resources.py

Comment thread services/core/models/src/nmp/core/models/schemas.py Outdated

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

github-advanced-security AI found potential problems May 28, 2026

View reviewed changes

marcusds added 2 commits May 28, 2026 12:19

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

Comment thread packages/nemo_platform_plugin/src/nemo_platform_plugin/git_url.py Outdated

chore: re-vendor quickstart SDK after _registry refactor

10fbf1a

Aligns import ordering in the SDK-vendored copies under sdk/python/nemo-platform/quickstart/ with what `make vendor` produces. Catches up after the registry-host helper consolidation. Signed-off-by: mschwab <mschwab@nvidia.com>

mckornfield approved these changes May 28, 2026

View reviewed changes

marcusds added this pull request to the merge queue May 28, 2026

Merged via the queue into main with commit 397e25e May 28, 2026
30 checks passed

Conversation

marcusds commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Alerts addressed

Test plan

Summary by CodeRabbit

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

marcusds commented May 28, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented May 28, 2026 •

edited

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading