Skip to content

fix(codeql): batch-4 medium-sev + warnings + log-injection defense#102

Merged
marcusds merged 12 commits into
mainfrom
fix-codeql-issues-batch-4/mschwab
May 29, 2026
Merged

fix(codeql): batch-4 medium-sev + warnings + log-injection defense#102
marcusds merged 12 commits into
mainfrom
fix-codeql-issues-batch-4/mschwab

Conversation

@marcusds
Copy link
Copy Markdown
Contributor

@marcusds marcusds commented May 28, 2026

Summary

Phase 4 of the CodeQL cleanup. Addresses 199 alerts in total — 22 medium-sev + warning fixes in-tree, 177 log-injection alerts closed via a central log-sanitization processor.

Alerts addressed

Severity Rule # How we addressed it
Medium py/stack-trace-exposure 1 + 3 dismissed nemo_platform_plugin.jobs.api_factory — drop str(e) from streaming JSONL error response (keep type + line number for diagnostics). 3 vendored switchyard alerts dismissed (snapshot, edits get overwritten).
Medium py/bind-socket-all-network-interfaces 3 Quickstart port-availability probes and docker-reconciler port check bind 127.0.0.1 instead of 0.0.0.0/empty-host — same semantics for the local dev/test use case, no listening surface.
Medium js/http-to-file-access 1 dismissed Studio build script with explicit CDN host allowlist + regex-validated version.
Medium js/missing-origin-check 1 dismissed Auto-generated MSW worker (storybook only).
Medium py/log-injection 177 Added _sanitize_log_strings processor in nmp.common.observability.structured_logging — replaces CR/LF/NEL/LS/PS in string fields before rendering. JSON renderer already escapes these; this guards the plain-text renderer too. 177 alerts dismissed with a back-reference to this fix. 8 unit tests added.
Warning py/redundant-comparison 2 Drop dead mid != base_id after earlier base-branch handles it; drop dead MoE branch in heuristics reached only on dense path.
Warning py/comparison-of-identical-expressions 2 Bind to a local before comparing so cache-identity assertions remain visible.
Warning py/file-not-closed 5 + 2 dismissed Convert open().read() to with-statement (copyright_fixer.py ×4, test_config.py ×1). 2 FPs dismissed (intentional process-long lock fd + pytest.raises test that never returns an fd).
Warning py/exit-from-finally 1 Move return result out of finally so KeyboardInterrupt/SystemExit aren't swallowed in the customizer training runner.
Warning py/implicit-string-concatenation-in-list 2 Wrap multi-line shell-command literals in parens.
Warning py/regex/duplicate-in-character-class 1 Simplify pytest-marker regex in mcp-dev-tools — the duplicate n/a chars were already covered by [a-z].
Warning py/unnecessary-pass 2 Remove dead pass after logger.debug in two except blocks.
Warning py/unnecessary-delete 2 Replace del unused_var with # noqa: ARG on the parameter list in test helpers.
Warning py/use-of-exit-or-quit 2 Replace exit() with raise SystemExit() in two scripts.
Warning js/trivial-conditional 1 Drop redundant chipContainer && check after the earlier guard.
Warning py/multiple-definition 1 Drop unused initial end = len(lines) in nemo_nb sugar (loop's else branch returns early). 14 sites still open for a follow-up.

Notable: log-injection central defense

177 py/log-injection sites all funnel through the structlog handler chain in nmp.common.observability.structured_logging. Rather than edit 177 individual log statements (each of which would be f"… {workspace}/{name} …"-style fields constrained by NAME_PATTERN and therefore unlikely to contain newlines in practice), the new _sanitize_log_strings processor strips newline-class characters from any string value in the event dict before the renderer runs.

The JSON renderer already escapes control characters, so this is primarily defense-in-depth for the plain-text console renderer. Each alert was dismissed with a back-reference to this commit.

Test plan

  • uv run pytest packages/nmp_common/tests/observability/test_structured_logging.py — 8 pass (new tests)
  • uv run pytest plugins/nemo-auditor/tests/test_sdk_resources.py — 2 pass (cache-identity tests still verify what they intended)
  • ruff check on all touched files — clean
  • Re-run CodeQL after merge; confirm the 22 in-tree alerts close

Stacking

Branch is off main. PRs #91/#98/#100 ahead; no conflicts expected.

Summary by CodeRabbit

  • New Features

    • Added log-string sanitization to prevent log-injection artifacts in plain-text logs.
  • Bug Fixes

    • Improved port availability checks with explicit wildcard binding.
    • Standardized error payloads to avoid leaking exception text.
    • Process termination now preserves original exit codes via raised SystemExit.
  • Tests

    • Added tests for log sanitization and tightened caching assertions.
  • Refactor

    • Use context managers for file/socket handling and simplified guard/conditional logic.
    • Emit debug logs on parsing failures.

Review Change Stack

marcusds added 2 commits May 28, 2026 16:41
Addresses 22 of 60+ open warning/medium-sev CodeQL alerts:

Medium severity (security):
- py/stack-trace-exposure: drop exception message from streaming JSONL
  error response in nemo_platform_plugin.jobs.api_factory (keep type +
  line number for diagnostics, drop str(e) which can leak internals).
- py/stack-trace-exposure x3: dismissed in vendored switchyard snapshot.
- py/bind-socket-all-network-interfaces x3: quickstart port-availability
  probes and docker reconciler port check bind 127.0.0.1 instead of
  0.0.0.0/empty-host — same intent for the local dev/test use case, no
  security risk surface.
- js/http-to-file-access: dismissed — Studio build script with explicit
  CDN host allowlist + regex-validated version.
- js/missing-origin-check: dismissed — auto-generated MSW worker.

Warnings (code-quality):
- py/redundant-comparison x2: drop dead 'mid != base_id' check after
  earlier base-branch handles it (provider_reconciler); drop dead MoE
  branch in heuristics.py reached only on dense path.
- py/comparison-of-identical-expressions x2: bind to a local before
  comparing so cache-identity assertions remain visible to analyzers.
- py/file-not-closed x4: convert open().read() to with-statement in
  copyright_fixer.py.
- py/file-not-closed x1: convert yaml.safe_load(open(...)) to
  with-statement in test_config.py.
- py/file-not-closed x2: dismissed (intentional process-long lock fd +
  pytest.raises in test that never returns an fd).
- py/exit-from-finally: move 'return result' out of finally in the
  customizer training runner so KeyboardInterrupt / SystemExit aren't
  swallowed.
- py/implicit-string-concatenation-in-list x2: wrap multi-line shell
  command literal in parens so the implicit concat is explicit.
- py/regex/duplicate-in-character-class: simplify pytest-marker regex
  in mcp-dev-tools (the 'andornt' chars were already covered by [a-z]).
- py/unnecessary-pass x2: remove dead 'pass' statements after a
  logger.debug call in two except blocks.
- py/unnecessary-delete x2: replace 'del unused_var' with ARG noqa on
  the parameter list in test helpers.
- py/use-of-exit-or-quit x2: replace exit() with raise SystemExit() in
  two scripts.
- js/trivial-conditional: drop redundant 'chipContainer &&' check after
  the earlier guard in docs api-filter.js.
- py/multiple-definition: drop unused initial 'end = len(lines)' in
  nemo_nb sugar (loop's else branch returns early).

Signed-off-by: mschwab <mschwab@nvidia.com>
…efense)

Adds _sanitize_log_strings processor that replaces CR/LF/NEL/LS/PS in string fields. Central defense for the 177 py/log-injection CodeQL alerts (each dismissed with reference to this fix). 8 unit tests.

Signed-off-by: mschwab <mschwab@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 28, 2026

Documentation preview is ready

Preview: https://nvidia-nemo.github.io/nemo-platform/pr-preview/pr-102/pr-102/

Built from 125fed4 in workflow run.

This preview is deployed from this PR branch, updates when docs changes are pushed, and will be removed when the PR closes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

Suite Lines Covered Line Rate Branch Rate
Unit Tests 18417/24388 75.5% 61.9%
Integration Tests 11776/23165 50.8% 26.0%

- Move _sanitize_log_strings after format_exc_info in the structlog
  processor chain so exception-message log-forgery vectors are also
  sanitized (Marta P0; format_exc_info expands exc_info into a
  multi-line 'exception' field that the old position bypassed).
- Replace raw U+2028/U+2029 bytes in _LOG_NEWLINE_CHARS with explicit
  \u2028\u2029 escapes so the source is editor-stable.
- Revert 127.0.0.1 port probes to 0.0.0.0 with S104/B104 suppressions;
  Docker publishes 0.0.0.0 by default and a loopback probe misses
  cross-interface conflicts that would later break the container.
- Guard self._write_result in the customizer training runner's finally
  block so a write failure doesn't replace the TrainingResult with an
  IOError.
- Add three observability tests: sanitizer covers the 'event' key,
  sanitizer runs on multi-line 'exception' fields after format_exc_info,
  and initialize_logging() actually wires _sanitize_log_strings into
  the processor chain (regression canary for the 177 dismissed alerts).

Signed-off-by: mschwab <mschwab@nvidia.com>
@marcusds marcusds marked this pull request as ready for review May 29, 2026 00:17
@marcusds marcusds requested review from a team as code owners May 29, 2026 00:17
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b39a08a2-2de5-4f9c-90a9-2dc0a1e0509f

📥 Commits

Reviewing files that changed from the base of the PR and between 1ba7ac4 and 125fed4.

📒 Files selected for processing (2)
  • packages/nemo_evaluator_sdk/tests/execution/test_benchmark_execution.py
  • packages/nemo_platform_plugin/src/nemo_platform_plugin/jobs/api_factory.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/nemo_evaluator_sdk/tests/execution/test_benchmark_execution.py

📝 Walkthrough

Walkthrough

PR applies targeted refactoring and hardening across 23 files: adds log-string sanitization to prevent log-forgery, standardizes process termination to use SystemExit, improves error logging, simplifies conditionals, ensures context-managed I/O, tightens tests, and updates port-binding to explicit wildcard addresses.

Changes

Observability, Error Handling, and Code Quality

Layer / File(s) Summary
Log sanitization for security
packages/nmp_common/src/nmp/common/observability/structured_logging.py, packages/nmp_common/tests/observability/test_structured_logging.py
_sanitize_log_strings processor removes newline-like characters from event fields to prevent log-forgery in plain-text renderers; wired into initialize_logging() and covered by parameterized tests.
Exit code handling standardization
docs/evaluator/test_doc_examples.py, services/intake/scripts/seed-entries.py
Replace exit() calls with raising SystemExit() (preserving exit codes and chaining); minor import formatting change.
Error handling and logging improvements
packages/nemo_nb/nemo_nb/sphinx.py, services/core/files/src/nmp/core/files/app/file_lock.py
Log frontmatter parse failures and lock-conflict conditions instead of silent pass statements.
Control flow and condition simplifications
docs/javascripts/api-filter.js, packages/nemo_nb/nemo_nb/sugar.py, services/core/models/src/nmp/core/models/controllers/provider_reconciler.py, services/core/models/src/nmp/core/models/parallelism/hueristics.py
Remove redundant null checks, simplify trailing-empty-line trimming, and remove duplicate guards.
Context-managed file I/O patterns
packages/nemo_platform_ext/tests/config/test_config.py, script/copyright_fixer.py
Standardize file reads and YAML loading to use with open(...) context managers.
Test assertions and validation tightening
plugins/nemo-auditor/tests/test_sdk_resources.py, packages/nemo_platform_plugin/src/nemo_platform_plugin/jobs/api_factory.py, tools/mcp-dev-tools/nmp_dev_mcp.py
Verify cached properties return same objects across accesses, make JSONL validation error message generic, and relax pytest markers validation regex.
Code quality and linting annotations
packages/nemo_evaluator_sdk/tests/execution/test_benchmark_execution.py, packages/nemo_platform_plugin/tests/test_commands.py
Suppress unused-parameter warnings with # noqa or _ prefixes instead of deleting variables.
Test string and command formatting
services/core/jobs/tests/controllers/test_subprocess_backend.py, services/customizer/tests/tasks/training/test_errors.py
Reformat long shell commands and expected error-message strings into parenthesized multi-line literals.
Port availability checking improvements
services/core/models/src/nmp/core/models/controllers/backends/docker/creation_reconciler.py, packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/preflight.py, packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/validators.py
Use explicit 0.0.0.0 wildcard bind address, context-manage sockets, enable SO_REUSEADDR, and add noqa/nosec annotations.
Training runner return control flow
services/customizer/src/nmp/customizer/tasks/training/runner.py
Reposition return result to execute after the finally block that writes coordinator-only results.

Suggested reviewers

  • mckornfield
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title accurately summarizes the main objective: CodeQL fixes addressing medium-severity issues, warnings, and log-injection defense across multiple files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-codeql-issues-batch-4/mschwab

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/preflight.py`:
- Around line 197-200: The probe socket is not closed if sock.bind raises;
change the socket creation to use a context manager so it is closed on both
success and failure—replace "sock = socket.socket(...); try: sock.bind(...);
sock.close()" with "with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as
sock: sock.bind(("0.0.0.0", self.config.host_port))" (keep the same socket
flags/comments) so socket.socket, sock.bind and sock.close are handled
automatically.

In `@packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/validators.py`:
- Around line 217-220: The socket created for port probing (socket.socket(...)
assigned to sock) can leak a file descriptor if bind(...) raises OSError before
sock.close() is reached; change the probe in quickstart/validators.py to use a
context manager or ensure sock is closed in a finally block (e.g., with
socket.socket(...) as sock:) around the bind/close logic so the socket is always
closed even on exceptions.

In `@packages/nmp_common/tests/observability/test_structured_logging.py`:
- Around line 61-77: The test test_initialize_logging_wires_sanitizer_into_chain
mutates global logging state by calling initialize_logging() but only clears
handlers in the finally block; update the test to capture the root logger's
pre-test state (e.g., prior_handlers = root.handlers.copy() and prior_level =
root.level) before calling initialize_logging(), then in the finally block
restore that state by setting root.handlers = prior_handlers (or clearing and
extending) and root.setLevel(prior_level) so the original handlers and level are
fully restored after the test.

In `@services/customizer/src/nmp/customizer/tasks/training/runner.py`:
- Around line 152-156: The current try/except around self._write_result(result)
swallows write failures and still returns the TrainingResult, masking a hard
failure; change this so write errors are not treated as success — either remove
the try/except so the exception propagates, or log the error and re-raise
immediately after logging (do not return result). Update the block around
self._write_result and the return of result to ensure any exception from
_write_result surfaces to callers (or mark result as failed and then raise) so
downstream consumers are not misled.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 11de54ad-9d4d-4d5f-9887-bef2eaa0ccac

📥 Commits

Reviewing files that changed from the base of the PR and between d92d4e9 and 11c7395.

📒 Files selected for processing (23)
  • docs/evaluator/test_doc_examples.py
  • docs/javascripts/api-filter.js
  • packages/nemo_evaluator_sdk/tests/execution/test_benchmark_execution.py
  • packages/nemo_nb/nemo_nb/sphinx.py
  • packages/nemo_nb/nemo_nb/sugar.py
  • packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/preflight.py
  • packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/validators.py
  • packages/nemo_platform_ext/tests/config/test_config.py
  • packages/nemo_platform_plugin/src/nemo_platform_plugin/jobs/api_factory.py
  • packages/nemo_platform_plugin/tests/test_commands.py
  • packages/nmp_common/src/nmp/common/observability/structured_logging.py
  • packages/nmp_common/tests/observability/test_structured_logging.py
  • plugins/nemo-auditor/tests/test_sdk_resources.py
  • script/copyright_fixer.py
  • services/core/files/src/nmp/core/files/app/file_lock.py
  • services/core/jobs/tests/controllers/test_subprocess_backend.py
  • services/core/models/src/nmp/core/models/controllers/backends/docker/creation_reconciler.py
  • services/core/models/src/nmp/core/models/controllers/provider_reconciler.py
  • services/core/models/src/nmp/core/models/parallelism/hueristics.py
  • services/customizer/src/nmp/customizer/tasks/training/runner.py
  • services/customizer/tests/tasks/training/test_errors.py
  • services/intake/scripts/seed-entries.py
  • tools/mcp-dev-tools/nmp_dev_mcp.py
💤 Files with no reviewable changes (4)
  • packages/nemo_nb/nemo_nb/sphinx.py
  • packages/nemo_nb/nemo_nb/sugar.py
  • services/core/files/src/nmp/core/files/app/file_lock.py
  • packages/nemo_platform_plugin/src/nemo_platform_plugin/jobs/api_factory.py

Comment thread packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/preflight.py Outdated
Comment thread packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/validators.py Outdated
Comment thread packages/nmp_common/tests/observability/test_structured_logging.py Outdated
Comment thread services/customizer/src/nmp/customizer/tasks/training/runner.py Outdated
marcusds added 2 commits May 28, 2026 17:25
Signed-off-by: mschwab <mschwab@nvidia.com>
- preflight/validators: wrap socket creation in 'with' so OSError on
  bind doesn't leak the file descriptor.
- test_structured_logging: chain-presence test now saves/restores root
  handlers and level instead of dropping them on tear-down.
- customizer runner: when _write_result fails inside finally, mark the
  TrainingResult as failed with the write error rather than returning
  the unrecorded result as success.

Signed-off-by: mschwab <mschwab@nvidia.com>
Comment thread packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/preflight.py Dismissed
Comment thread packages/nemo_platform_ext/src/nemo_platform_ext/quickstart/validators.py Dismissed
marcusds and others added 5 commits May 28, 2026 20:39
Signed-off-by: mschwab <mschwab@nvidia.com>
- runner.py: preserve original training error_message when _write_result
  also fails (append persist error instead of replace). Guard the
  finally mark-failed with is_coordinator so workers don't construct a
  failure result.
- test_runner.py: add test_run_marks_result_failed_when_write_raises
  and test_run_appends_persist_error_to_existing_training_error.
- test_structured_logging.py: use \u2028 escape literal (was raw byte),
  add \u2029 case, find ProcessorFormatter by isinstance rather than
  handlers[0].
- preflight.py, validators.py: setsockopt(SO_REUSEADDR) before bind to
  avoid Linux TIME_WAIT false positives on the port probe (matches
  the existing creation_reconciler pattern).

Signed-off-by: mschwab <mschwab@nvidia.com>
Signed-off-by: mschwab <mschwab@nvidia.com>
…m-finally fix

The py/exit-from-finally CodeQL alert only required moving 'return result'
out of the finally block. The write-failure try/except + result-mutation
added in review was scope creep beyond the alert; revert to the minimal
dedent. _write_result failures now propagate as before.

Signed-off-by: mschwab <mschwab@nvidia.com>
Comment thread tools/mcp-dev-tools/nmp_dev_mcp.py
marcusds added 2 commits May 29, 2026 10:29
- test_benchmark_execution: drop the # noqa: ARG002 by renaming the
  positional-only stub arg to _increment (matches the dummy-variable
  convention already used for _kwargs in this file).
- api_factory: add a static 'detail' to the streamed validation error so
  clients get an actionable reason without echoing the raw exception
  text (full message + traceback still logged server-side).

Signed-off-by: mschwab <mschwab@nvidia.com>
…t compat

The streamed JSONL validation error previously returned error.message;
keep that key (renamed from the temporary detail) so existing clients
parsing error.message keep working.

Signed-off-by: mschwab <mschwab@nvidia.com>
@marcusds marcusds enabled auto-merge May 29, 2026 17:35
@marcusds marcusds added this pull request to the merge queue May 29, 2026
Merged via the queue into main with commit 60e15a2 May 29, 2026
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants