fix: harden streamed TAR and NeMo inspection by mldangelo-oai · Pull Request #1665 · promptfoo/modelaudit

mldangelo-oai · 2026-06-11T04:12:29Z

Summary

Harden TAR/NeMo archive inspection for large raw and compressed TAR-backed model artifacts without whole-archive materialization. Compressed TARs traverse through bounded r| readers; raw TARs retain the seekable path. TAR findings, NeMo Hydra coverage, HDF5 owner/supplemental precedence, link safety, incomplete/no-cache outcomes, and finding exit precedence are preserved.

Final exact-head repairs

Strict gzip/bzip2/xz concatenated-member TAR reading validates compressed stream endings and bounds physical zero padding; gzip uses zlib gzip-member validation so CRC/ISIZE/FHCRC failures and late zlib errors fail closed.
Proven non-HDF5 compressed TARs whose bounded NeMo metadata probe exhausts hand off to TarScanner, preserving earlier/reachable root-config findings; HDF5 overlap remains owner-first with TAR supplemental analysis.
Bare .gz/.bz2/.xz TARs route by structure before wrapper fallback, while file-type validation accepts only a structurally proven TAR with the matching physical codec, avoiding wrong-codec S901 false positives.
Declared compressed wrappers with matching structurally valid codec headers retain compressed ownership across SafeTensors header collisions, so gzip+SafeTensors polyglots fail closed through compressed supplemental scanning.
Raw TAR dispatch first proves a valid raw TAR header before considering wrapper magic, and raw TAR EOF rejects bounded nonzero tails while allowing valid sparse zero padding.
Removed replay-only NeMo assignments; typed the modified compressed-TAR fixture under tmp_path; documented generic TAR handoff compatibility.

Scope note

The replacement remains additive and does not rewrite PR history or main. It is not the requested ~1200 net production-line compact rewrite: current exact diff versus origin/main is production Python +2296/-464 (net +1832). The current behavior is validated, but further simplification is still warranted before merge if the compact-shape requirement is mandatory.

Validation

Exact local head: cfd24355.

uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 (before final tiny bare-wrapper validation repair: 19,676 passed, 1,401 skipped)
Exact final focused TAR scanner: 184 passed, 1 opt-in artifact test skipped.
Exact final TAR/NeMo/compressed/filetype affected surface before final tiny bare-wrapper validation repair: 1,928 passed, 12 optional-dependency/opt-in skips.
Final gzip+SafeTensors routing repair: exact failing test 1 passed; adjacent core routing 5 passed; adjacent file-detection 7 passed; changed-scope Ruff format/check and mypy clean.

Exact-head GitHub Actions and required human approval are pending; do not merge until both are green.

mldangelo-oai · 2026-06-11T04:14:02Z

@codex review

github-actions · 2026-06-11T04:14:43Z

Workflow run and artifacts

Performance Benchmarks

Compared 13 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 13 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 4.415s -> 4.368s (-1.1%).

Workload	Benchmark	Target	Size	Files	Baseline	Current	Change	Status
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64]`	`nested_base64`	98 B	1	291.1us	299.3us	+2.8%	stable
`rejected-basic-auth-candidates`	`tests/benchmarks/test_scan_benchmarks.py::test_rejected_basic_auth_candidates_scan_linearly`	`-`	371.1 KiB	1	2.577s	2.512s	-2.5%	stable
`padded-multi-stream-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload`	`multi_stream_padded`	4.1 KiB	1	349.2us	342.5us	-1.9%	stable
`suspicious-pickle-intake`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake`	`suspicious-intake`	183.8 KiB	4	148.42ms	150.88ms	+1.7%	stable
`clean-training-checkpoint`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint`	`safe_large`	278.2 KiB	1	111.63ms	113.39ms	+1.6%	stable
`warm-cache-rescan`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan`	`release-candidate`	547.3 KiB	32	152.76ms	154.78ms	+1.3%	stable
`duplicate-heavy-registry`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot`	`registry-snapshot`	915.2 KiB	13	571.78ms	578.75ms	+1.2%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex]`	`nested_hex`	130 B	1	305.9us	303.2us	-0.9%	stable
`mixed-model-repository`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository`	`release-candidate`	547.3 KiB	32	632.24ms	636.92ms	+0.7%	stable
`single-checkpoint-preflight`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load`	`single_checkpoint.pkl`	183.0 KiB	1	103.25ms	103.72ms	+0.5%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw]`	`nested_raw`	78 B	1	272.6us	273.4us	+0.3%	stable
`chunked-upload-stream`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream`	`chunked_stream`	278.2 KiB	1	116.16ms	116.37ms	+0.2%	stable
`direct-malicious-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload`	`malicious_reduce`	52 B	1	221.3us	220.9us	-0.2%	stable

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a2cd64e965

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T04:31:04Z

@codex address these exact-head independent-review blockers before merge:

High: core TAR routing traverses and decompresses the stream before the new scanner guard. A 32 KiB gzip TAR with a 32 MiB member returned more than 67 MiB from gzip reads despite a 64 KiB cap; the pinned 1.25 GB artifact was decompressed four full times (5.38 GB).
High: PAX/GNU extension headers are materialized inside tar.next() before limits are checked. An 8 KiB gzip TAR with an 8 MiB PAX pathname caused about 67 MiB peak allocation under a 64 KiB limit.
High: max_total_size is only diagnosed after all TAR members are extracted and dispatched. Ten 32 KiB members were all scanned under a 64 KiB total budget; nested archives reset the budget.
Medium: unsafe/rejected links are still published with scan_status="link_validated".

Please carry one shared bounded work/decompression budget through routing, preflight, extraction, and nested dispatch; bound raw extension headers before tarfile materializes them; and add instrumented end-to-end regressions. Independent report: /Users/mdangelo/modelaudit-pr-swarm/outputs/pr-1665-independent-review-20260611.md.

mldangelo-oai · 2026-06-11T04:34:37Z

@codex review the current head after the TAR link inventory review fix. Please verify unsafe symlink/hardlink members are recorded as rejected while safe links remain link_validated, and that sparse compatibility handling stays unchanged.

chatgpt-codex-connector · 2026-06-11T04:37:06Z

Codex Review: Didn't find any major issues. Swish!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T04:56:10Z

PR #1665 independent re-review

Findings

1. High — compressed-TAR limits still run after an unbounded routing traversal

The current head still routes TAR content before TarScanner preflight. core.scan_file() calls format detection at modelaudit/core.py:3620-3649; compressed/TAR detection calls _detect_tar_route() at modelaudit/utils/file/detection.py:6076-6108; and _detect_tar_route() opens and iterates the entire archive at modelaudit/utils/file/detection.py:3796-3824. The configured decompression limit is not consulted until TarScanner._preflight_tar_archive() calls tar.next() and then checks tar.offset at modelaudit/scanners/tar_scanner.py:405-456.

Exact-head reproduction at 4c0a5be212f19ceb530d2f6980db069abf18af06:

Input: 32,720-byte gzip TAR declaring one 32 MiB regular member.
Configuration: compressed_max_decompressed_bytes=65,536, permissive ratio, caching disabled.
Result: the scanner eventually returned the expected limit failure, but instrumented gzip._GzipReader.read() returned 67,129,344 decompressed bytes before termination—more than 1,000 times the configured cap and approximately two complete logical traversals.

The pushed delta from a2cd64e9655b68a996960aa6290fd75ca1586109 to the current head changes only link inventory handling, tests, and an explanatory comment; it does not alter routing or preflight. The direct-scanner bomb test at tests/scanners/test_tar_scanner.py:1452-1474 verifies only the terminal result. The core-routing test at tests/scanners/test_tar_scanner.py:2265-2285 likewise does not measure work performed before the failure.

Required fix: make TAR routing bounded/header-only or transfer TAR ownership before _detect_tar_route() traverses the stream. Enforce one work/decompression budget across routing, preflight, extraction, and nested dispatch. Add a core.scan_file() regression that instruments the decompressor and asserts bytes returned never exceed a small bounded allowance around the configured cap.

2. High — PAX/GNU extension metadata is still materialized before any size check

Both TAR passes call tar.next() before inspecting the consumed offset (modelaudit/scanners/tar_scanner.py:412-456 and modelaudit/scanners/tar_scanner.py:628-648). Python tarfile consumes hidden PAX extended headers and GNU longname/longlink records inside tar.next(), including allocating and parsing their bodies, before returning a visible TarInfo. The scanner therefore cannot enforce its decompression or entry-size limits until after metadata allocation has occurred.

Exact-head reproduction:

Input: 8,318-byte gzip TAR with an 8 MiB PAX pathname.
Configuration: compressed_max_decompressed_bytes=65,536, permissive ratio.
Result: the scanner failed with Decompressed size exceeded limit (8396800 > 65536), but only after tracemalloc recorded 67,177,758 bytes of peak allocation.

The sparse regression at tests/scanners/test_tar_scanner.py:1922-1947 synthesizes GNU.sparse.size through normal tarfile writing and validates the post-parse classification. It does not bound allocation or cover adversarial PAX path/linkpath/global headers, malformed PAX lengths, GNU longname/longlink bodies, or real GNU/PAX sparse encodings.

Required fix: inspect and cumulatively bound raw extension-header bodies before delegating them to tarfile, then carry the same metadata/work budget across all passes and nested archives. Add adversarial PAX/GNU tests that assert both fail-closed outcomes and bounded decompressor bytes/peak allocation.

3. High — `max_total_size` remains an after-the-fact diagnostic and resets across nested archives

TAR extraction enforces only a per-member size at modelaudit/scanners/tar_scanner.py:293-333. Every regular member is extracted and dispatched at modelaudit/scanners/tar_scanner.py:749-818; nested scans receive fresh copies of the unchanged configuration at modelaudit/scanners/tar_scanner.py:768-804. The aggregate limit is checked only after scan_file() returns at modelaudit/core.py:3178-3198.

Exact-head reproductions:

Flat archive: ten 32 KiB members under max_total_size=65,536 produced bytes_scanned=327,680; only afterward did core report Total scan size limit exceeded: 327680 bytes (max: 65536).
Nested archive: two nested TARs containing one 40 KiB member each under the same total limit produced bytes_scanned=81,920; both children were scanned before the aggregate failure.

This leaves max_total_size unable to bound extraction, scanner dispatch, or nested decompression work. The only nearby aggregate regression, tests/test_core.py:2182-2206, mocks scan_file() for a directory and proves only that core stops between top-level targets. No test exercises cumulative member or nested-archive work.

Required fix: pass a shared remaining-byte/work budget object through the complete archive tree; reject declared members that cannot fit, decrement actual streamed bytes, and stop before dispatching work that crosses the budget. Preserve earlier security findings and mark the stopped archive inconclusive. Add flat and multi-level nested end-to-end tests with read/dispatch counters.

Prior-blocker reconciliation

Fixed — rejected-link inventory state

The current commit 4c0a5be212f19ceb530d2f6980db069abf18af06 moves inventory publication after link validation and selects rejected when validation fails (modelaudit/scanners/tar_scanner.py:667-715). Exact-head runtime coverage verified:

safe parent-relative symlink: link_validated
safe root-relative hardlink: link_validated
escaping symlink: rejected
escaping hardlink: rejected
empty-target symlink: rejected
critical absolute hardlink: rejected

The focused suite also passes the new assertions at tests/scanners/test_tar_scanner.py:976-1023 and tests/scanners/test_tar_scanner.py:1082-1115. The prior Codex thread is now outdated but remains unresolved in GitHub; its underlying behavior is fixed. The current code-quality comment thread is resolved.

Test gap: the committed assertions cover rejected escaping symlinks, empty symlinks/hardlinks, and safe symlinks, but do not directly assert inventory status for safe hardlinks, escaping hardlinks, or critical absolute targets. The independent matrix passed; adding those cases would prevent the fixed state from regressing.

Exact review snapshot

PR: fix: harden streamed TAR and NeMo inspection #1665
Current reviewed head: 4c0a5be212f19ceb530d2f6980db069abf18af06
Previous reviewed head, explicitly not reused as current: a2cd64e9655b68a996960aa6290fd75ca1586109
Current base/main SHA: 8d6c4864fe2ea833ceaef1b9803d225afb1e8d69
Head commits: a2cd64e9655b68a996960aa6290fd75ca1586109, 4c0a5be212f19ceb530d2f6980db069abf18af06
Changed files: CHANGELOG.md, modelaudit/scanners/tar_scanner.py, tests/scanners/test_tar_scanner.py
Snapshot refreshed from GitHub at 2026-06-11T04:54:08Z; the head remained unchanged across the review.
AGENTS.md scope: repository root only for all changed files; no scoped descendant AGENTS.md applies.

Independent validation

All commands used an extracted source archive of the exact head under /private/tmp; the existing local checkout was not modified.

Focused exact-head suite:

PYTHONDONTWRITEBYTECODE=1 \
PYTHONPATH=/private/tmp/modelaudit-pr1665-4c0 \
PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 \
/Users/mdangelo/code/modelaudit/.venv/bin/python -m pytest \
  tests/scanners/test_tar_scanner.py -q -p no:cacheprovider

140 passed, 1 skipped in 32.93s

The skipped test is the opt-in 1.25 GB pinned Hugging Face artifact test at tests/scanners/test_tar_scanner.py:1506.

Additional exact-head checks:

ruff check modelaudit/scanners/tar_scanner.py tests/scanners/test_tar_scanner.py
All checks passed!

ruff format --check modelaudit/scanners/tar_scanner.py tests/scanners/test_tar_scanner.py
2 files already formatted

Custom end-to-end probes:

Probe	Configured bound	Observed work/result
Core-routed gzip TAR	65,536 decompressed bytes	67,129,344 bytes returned by gzip before failure
8 MiB PAX pathname	65,536 decompressed bytes	67,177,758-byte peak allocation before failure
Ten 32 KiB members	65,536 total bytes	327,680 bytes scanned before aggregate failure
Two nested 40 KiB members	65,536 total bytes	81,920 bytes scanned before aggregate failure
Link status matrix	safe/rejected semantics	all six expected statuses and S406 failures matched

No additional independent correctness or policy finding survived validation beyond the three unchanged resource-bound blockers above.

Live CI and review state

At the final snapshot:

GitHub reported mergeable: MERGEABLE, mergeStateStatus: BLOCKED, and reviewDecision: REVIEW_REQUIRED.
No completed check was failing.
CodeQL, lint/format, type check, dependency audit, package build, vendored-proto checks, Docker checks, documentation formatting, title validation, and benchmarks had passed.
Quick Feedback Python 3.12, Windows Python 3.11, Python 3.10, and Python 3.13 were still in progress.
Review threads: one resolved current-head code-quality thread; one unresolved but outdated link-inventory thread whose behavior is now fixed.
PR CI still has no end-to-end assertion for pre-routing decompression work, metadata allocation, or cumulative nested TAR budgets.

Merge disposition

REQUEST CHANGES — NOT MERGE READY.

The rejected-link blocker is fixed at the current exact head. The three high-severity resource-bound blockers are unchanged and independently reproduced: compressed TAR routing performs substantial work before the advertised cap, PAX/GNU metadata allocates before limits are checked, and max_total_size does not stop flat or nested archive work. Green focused tests and currently green completed CI jobs do not exercise those boundaries.

Before merge, implement shared end-to-end routing/decompression/metadata/total-work budgets, add instrumented core-level regressions, rerun the focused suite and adversarial probes, and complete exact-head CI.

chatgpt-codex-connector · 2026-06-11T05:12:42Z

Summary

Added a shared TAR work budget and bounded read wrapper so TAR traversal, compressed decompression, PAX/GNU header reads, extraction, and nested dispatch consume from the same capped budget instead of repeatedly materializing the stream. modelaudit/scanners/tar_scanner.pyL59-L100 modelaudit/scanners/tar_scanner.pyL428-L459
Changed TAR scanning to stream through the bounded reader, enforce entry-count and total extraction limits during traversal, propagate the shared budget to nested TAR members, and fail incomplete wrapper analysis when decompressed size/ratio checks do not pass. modelaudit/scanners/tar_scanner.pyL735-L777 modelaudit/scanners/tar_scanner.pyL893-L905 modelaudit/scanners/tar_scanner.pyL995-L1045
Avoided expensive core TAR routing validation by trusting direct TAR header routing and by routing compressed TAR suffixes to TarScanner without walking the compressed stream in normal routing. modelaudit/core.pyL1363-L1366 modelaudit/utils/file/detection.pyL6078-L6086
Preserved reduced-limit NeMo route test behavior while preventing normal compressed TAR route detection from traversing the archive. modelaudit/utils/file/detection.pyL3802-L3808
Fixed unsafe TAR symlink/hardlink inventory so rejected links are recorded as scan_status="rejected" instead of link_validated. modelaudit/scanners/tar_scanner.pyL795-L825 modelaudit/scanners/tar_scanner.pyL837-L842
Added instrumented end-to-end regressions for bounded gzip reads during core routing, oversized PAX header materialization, total budget short-circuiting before all nested dispatches, and rejected link inventory. tests/scanners/test_tar_scanner.pyL2267-L2295 tests/scanners/test_tar_scanner.pyL2297-L2325 tests/scanners/test_tar_scanner.pyL2327-L2359 tests/scanners/test_tar_scanner.pyL2361-L2381

Testing

✅ PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 uv run pytest tests/scanners/test_tar_scanner.py tests/test_core.py::test_scan_file_size_limit_precedes_zip_preflight tests/scanners/test_compressed_scanner.py -q
✅ PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 uv run pytest tests/test_core.py -q
✅ PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 uv run pytest tests/utils/file/test_filetype.py -q
✅ uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
✅ uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
✅ uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
✅ git diff --check
⚠️ PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 (warning: manually stopped after unrelated failures appeared before completion; targeted affected suites above passed)

Committed changes on the current branch with commit 80b5740 and created the PR record via make_pr.

View task →

mldangelo-oai · 2026-06-11T05:39:08Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a0311885d1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T07:09:23Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e0fb2da481

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T07:21:06Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ad0802b00

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T07:44:30Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0a9465d57c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T07:58:19Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6f2ad9c6b2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T08:31:46Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4300acb118

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-17T04:27:41Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2e6aafc229

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-17T04:38:36Z

@codex review

chatgpt-codex-connector · 2026-06-17T04:45:51Z

Codex Review: Didn't find any major issues. Swish!

Reviewed commit: 4763b15dfe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-17T04:50:26Z

@codex review

chatgpt-codex-connector · 2026-06-17T05:00:10Z

Codex Review: Didn't find any major issues. More of your lovely PRs please.

Reviewed commit: 67b7f48ca7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…t38-large-tar-streaming-20260610 # Conflicts: # modelaudit/core.py # modelaudit/utils/file/detection.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 36563ef2b7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-17T19:25:13Z

+                            if not allow_incomplete_generic_tar_route and find_hdf5_signature_offset(path) is not None:
+                                return NEMO_ROUTING_INCONCLUSIVE_FORMAT


Route proven compressed TARs despite HDF5 bytes

When a renamed gzip/bzip2/xz TAR has a first regular member larger than the 64 KiB NeMo routing skip budget, this branch returns nemo_routing_inconclusive solely because find_hdf5_signature_offset() finds a legal-offset HDF5 superblock anywhere in the physical wrapper. A gzip FEXTRA field can place a plausible HDF5 superblock at offset 512 while the stream is still a valid TAR; with a generic .gz/.bin suffix, core then treats the file as an HDF5/NeMo overlap and never invokes TarScanner, so later malicious TAR members that the streaming scanner would inspect are reduced to an operational routing failure. Distinguish top-level HDF5/userblock ownership from HDF5 bytes inside the compressed wrapper, or keep the already-proven compressed TAR routed to tar for generic suffixes.

AGENTS.md reference: AGENTS.md:L113-L116

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-17T19:25:13Z

+                if entry_count > scanner.max_entries:
+                    return "inconclusive"


Preserve raw TAR user-block scans at entry limits

When an HDF5 user block contains a complete raw TAR prefix, this ownership proof uses the caller's max_tar_entries as a structural cutoff and returns inconclusive. If a root model_config.yaml appears before a later entry that exceeds that configured scan limit, core treats ownership as incomplete and never runs the bounded NeMo/TAR supplemental scan that would have reported the reachable Hydra finding plus the entry-limit failure. Return a scan-limit state here, as the compressed ownership path does, so detections before the limit are preserved.

AGENTS.md reference: AGENTS.md:L113-L116

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-17T19:25:13Z

+        while True:
+            chunk = source.read(chunk_size)
+            if not chunk:
+                break
+            if any(chunk):
+                raise _CorruptStreamError("Invalid compressed stream: nonzero bytes follow zero padding")
+            padding_bytes += len(chunk)


Bound accepted compressed user-block padding

For HDF5 user-block supplemental compressed scans, core enables zero-padding acceptance and bounds the source at the HDF5 signature offset, so a tiny compressed payload followed by a large zero-filled user block makes this loop read every padding byte before returning. With large-file limits raised or disabled for real artifacts, a sparse-looking but allocated user block can spend hundreds of MiB or more in padding validation even though those bytes are non-payload; cap accepted padding with a configured limit or fail closed once it is exceeded.

AGENTS.md reference: AGENTS.md:L105-L111

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-17T19:25:13Z

+
+            assert hdf5_signature_offset is not None
+            supplemental_config = dict(config)
+            supplemental_config[COMPRESSED_SOURCE_SIZE_LIMIT_CONFIG_KEY] = hdf5_signature_offset


Prove compressed ownership before truncating

When the supplemental route is compressed, this unconditionally caps CompressedScanner at the HDF5 signature offset. For a gzip wrapper whose FEXTRA places a plausible HDF5 superblock at offset 512 while the compressed payload continues afterward, the primary HDF5 route owns the file and the supplemental compressed scan only sees the truncated prefix, reports a decode failure, and never scans the payload that an unbounded compressed scanner would have reached. Prove the compressed stream actually ends at the HDF5 boundary before applying this limit, or fail closed without suppressing the compressed-payload scan.

AGENTS.md reference: AGENTS.md:L113-L116

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f2b0c21bd3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-17T21:21:45Z

+            while True:
+                member = archive.next()
+                if member is None:
+                    return "complete" if scanner._raw_tar_has_complete_end_marker(archive) else "incomplete"


Preserve raw TAR user-block scans at tail padding limits

When an HDF5 user block starts with a valid raw TAR containing model_config.yaml and then has zero padding beyond compressed_max_xz_padding_bytes before the HDF5 superblock, this returns incomplete rather than a scan-limit state. Core treats that as unowned and skips the supplemental NeMo/TAR scan, so a reachable Hydra _target_ before the padding is never reported; classify bounded post-EOF padding exhaustion as a scan-limit/fail-closed condition and merge the prefix scan like entry-limit handling.

AGENTS.md reference: AGENTS.md:L113-L115

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-17T21:21:45Z

+            if hdf5_compressed_prefix_ownership == "complete":
+                supplemental_config[COMPRESSED_SOURCE_SIZE_LIMIT_CONFIG_KEY] = hdf5_signature_offset
+                supplemental_config[ALLOW_ZERO_PADDING_TRAILING_CONFIG_KEY] = True
+                supplemental_config[COMPRESSED_PREFIX_OWNERSHIP_CONFIG_KEY] = True
+            else:
+                supplemental_config["cache_enabled"] = False


Preserve compressed user-block payloads at scan limits

When an HDF5 user block contains a compressed payload before excessive zero padding or another compressed scan limit, classify_compressed_prefix_ownership() returns scan_limit, but this branch only applies the HDF5 boundary for complete. The supplemental CompressedScanner then scans the whole .h5 without zero-padding acceptance, fails on the HDF5 body/trailer before dispatching the decompressed payload, and can miss a malicious pickle that was already reachable before the padding; treat scan_limit like the TAR path and run the prefix-bounded compressed scan while preserving the incomplete result.

AGENTS.md reference: AGENTS.md:L113-L115

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-17T21:21:45Z

+    except (EOFError, OSError, tarfile.TarError, ValueError):
+        return "inconclusive"


Preserve raw TAR user-block scans at metadata limits

When an HDF5 user block contains a raw TAR prefix with an early model_config.yaml and a later PAX/GNU metadata record that exceeds max_tar_metadata_bytes, the bounded TarInfo path raises _TarStreamBudgetExceeded, but this catch collapses it to inconclusive. Core then marks TAR ownership incomplete and skips the supplemental NeMo/TAR scan entirely, so the reachable Hydra _target_ finding before the metadata limit is lost; return a scan-limit state for metadata-budget exhaustion so the bounded prefix scan still runs and reports the incomplete coverage.

AGENTS.md reference: AGENTS.md:L113-L115

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fd9361b06f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-17T22:26:41Z

+            if hdf5_compressed_prefix_ownership in {"complete", "scan_limit"}:
+                supplemental_config[COMPRESSED_SOURCE_SIZE_LIMIT_CONFIG_KEY] = hdf5_signature_offset
+                supplemental_config[ALLOW_ZERO_PADDING_TRAILING_CONFIG_KEY] = True
+                supplemental_config[COMPRESSED_PREFIX_OWNERSHIP_CONFIG_KEY] = True
+                if hdf5_compressed_prefix_ownership == "scan_limit":
+                    supplemental_config[PRESERVE_LIMITED_PREFIX_PAYLOAD_CONFIG_KEY] = True


Preserve compressed user-block findings after corrupt tails

When a validated HDF5 user block begins with a complete compressed payload, such as a malicious pickle, and then has non-zero user-block bytes before the HDF5 superblock, ownership becomes incomplete; because this block only enables the bounded/preserve path for complete or scan_limit, the supplemental CompressedScanner sees the trailer as a decode error and deletes the temp payload before routing it, so the reachable S201 finding is missed. Preserve the decoded prefix and report the ownership failure afterward.

AGENTS.md reference: AGENTS.md:L113-L115

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-17T22:26:41Z

+            elif hdf5_tar_prefix_ownership != "embedded_member":
+                supplemental_ownership_inconclusive = True


Preserve raw TAR user-block findings after tail failures

When a validated HDF5 user block starts with a raw TAR/NeMo archive that already contains a malicious model_config.yaml or pickle, adding non-zero bytes after the TAR EOF but before the HDF5 superblock makes classify_raw_tar_prefix_ownership() return incomplete; this branch then suppresses the supplemental NeMo/TAR scan, so reachable findings are replaced by only hdf5_tar_prefix_ownership_incomplete. Run the prefix-bounded scan and merge the ownership failure afterward.

AGENTS.md reference: AGENTS.md:L113-L115

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-17T22:26:41Z

+                                ownership = classify_compressed_tar_prefix_ownership(path, hdf5_signature_offset)
+                                if ownership != "incomplete":
+                                    return NEMO_ROUTING_INCONCLUSIVE_FORMAT


Keep proven HDF5 user-block TARs routed to TAR

When a valid HDF5 user block contains a complete compressed TAR whose first regular member exceeds the 64 KiB NeMo routing skip budget, this returns nemo_routing_inconclusive even though ownership was proved within the HDF5 boundary. In core that inconclusive format is not selected as a supplemental tar/nemo scanner, so later reachable members such as a malicious model_config.yaml or pickle are skipped and only the routing-incomplete result is reported; return tar (or otherwise carry the bounded TAR route) so the streaming scanner preserves those findings.

AGENTS.md reference: AGENTS.md:L113-L115

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-17T22:26:41Z

+        source_size = self._source_size(path)
+        size_check_result = self._check_source_size_limit(path, source_size)
        if size_check_result:
            return size_check_result


Preserve bounded compressed prefixes despite source caps

Fresh evidence in the current tree is that scan_limit now enables the preserve flag, but this early size check still returns before _decompress_to_tempfiles() can preserve anything. When an HDF5 user block is large (for example >512 MiB of zero padding before the HDF5 superblock) but contains a small malicious gzip/bzip2/xz/zlib payload at the start, ownership is classified as scan_limit, the supplemental compressed scan hits this source-size result, and the reachable payload findings are never merged; skip this source-size short-circuit for bounded prefix ownership or run the preserved-prefix path before reporting the limit.

AGENTS.md reference: AGENTS.md:L113-L115

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0d129e2657

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-17T23:12:32Z

+                            },
+                        )
+                        if compression_codec is not None:
+                            break


Continue scanning after oversized compressed TAR members

When a gzip/bzip2/xz TAR contains an entry that trips _TarEntryExtractionIncomplete only because it is over max_entry_size, this break stops before later headers even though the aggregate and compressed-stream budget checks above accepted the member. A crafted archive can put an oversized benign file before a malicious pickle or NeMo config and get only an incomplete entry check, so continue draining/skipping the member and keep scanning until a real stream or budget failure occurs.

AGENTS.md reference: AGENTS.md:L113-L115

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-17T23:12:32Z

+            if hdf5_compressed_prefix_ownership != "complete":
+                supplemental_config["cache_enabled"] = False
+                _mark_inconclusive_scan_outcome(result, _HDF5_COMPRESSED_PREFIX_OWNERSHIP_INCOMPLETE_REASON)


Honor scanner selection before ownership failures

When scanner selection excludes the supplemental user-block scanner (for example --scanners keras_h5 on an HDF5 file whose user block starts with a corrupt gzip payload), this branch still marks the primary Keras result incomplete and unsuccessful before _merge_supplemental_scanner_analysis() has a chance to skip compressed. That makes a scanner the user did not enable affect the exit status; gate these ownership-failure checks on scanner_selection.allows(hdf5_userblock_supplemental_scanner_id) or emit only the selection skip when the supplemental scanner is disabled.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

modelaudit/modelaudit/utils/file/detection.py

Line 6853 in 9d858fb

if _tar_member_materializes_file_content(member):

Bound compressed skips for non-file TAR bodies

When a compressed TAR contains a non-regular member (for example a directory, symlink, or special entry) with a large declared size, this branch skips the body-budget check below because it only runs for members that materialize file content. tarfile still has to consume that body before the next header in stream mode, so routing can decompress an arbitrary amount of data before handing off to TarScanner; apply the same skip budget to any member with a body, or return tar before continuing.

AGENTS.md reference: AGENTS.md:L111-L111

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-18T00:45:26Z

+            if hdf5_tar_prefix_ownership in {"complete", "scan_limit"}:
+                supplemental_config = dict(config)
+                supplemental_config[TAR_SOURCE_SIZE_LIMIT_CONFIG_KEY] = hdf5_signature_offset
+            elif hdf5_tar_prefix_ownership != "embedded_member":


Preserve TAR analysis for embedded HDF5 members

When a valid raw TAR has a plausible HDF5 superblock inside a regular member, classify_raw_tar_prefix_ownership() returns embedded_member; this branch then skips both the bounded supplemental TAR scan and the ownership failure path. Since scan_file() has already selected keras_h5 whenever find_hdf5_signature_offset() succeeds, any later TAR members (for example a malicious pickle or NeMo config after the embedded HDF5 member) are never inspected even though the outer TAR structure is trusted and routable. Route this state to TAR ownership, or run an unbounded/appropriately bounded supplemental TAR scan instead of suppressing it.

AGENTS.md reference: AGENTS.md:L115-L115

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-18T00:45:26Z

+    if validated_hdf5_nemo_overlap:
+        result.merge(_make_incomplete_nemo_routing_result(path))


Honor scanner selection for NeMo routing overlaps

When HDF5 owns the file but the user has selected only the HDF5 scanner, an inconclusive NeMo/TAR user-block probe still reaches this unconditional merge and adds nemo_routing_incomplete, making the primary Keras result unsuccessful even though the supplemental NeMo/TAR scanner was not enabled. Gate this fail-closed overlap result on the supplemental scanner being allowed, or emit only the scanner-selection skip as the other HDF5 user-block supplemental paths do.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-18T00:45:26Z

+                                ownership = classify_compressed_tar_prefix_ownership(path, hdf5_signature_offset)
+                                if ownership == "inconclusive":
+                                    return NEMO_ROUTING_INCONCLUSIVE_FORMAT
+                            return "tar"


Preserve linked NeMo configs after compressed skip limits

When a compressed TAR has a large regular member before a root model_config.yaml symlink, this early handoff routes the archive to generic TAR before the link can be observed. Generic TarScanner only records conservative link-semantics incomplete for a root config symlink and does not follow the safe target, so adding a large first member can turn a reachable Hydra _target_ finding in the linked YAML into a non-actionable inconclusive TAR result; keep enough NeMo link evidence to route to NemoScanner or teach the TAR fallback to preserve the linked config finding.

AGENTS.md reference: AGENTS.md:L115-L115

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9a2bec28f1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-18T01:47:28Z

        hdf5_userblock_supplemental_scanner_id = (
            _select_hdf5_userblock_supplemental_scanner_id(path, magic_format, ext, config)
            if scanner_id == "keras_h5" and hdf5_signature_offset not in (None, 0)
            else None


Run selected user-block scanners when Keras is skipped

When scanner selection disables keras_h5 but enables the discovered user-block owner, this supplemental scanner can be dropped before it runs. For example, --scanners compressed on an .h5 file whose user block starts with a gzip-compressed pickle sets hdf5_userblock_supplemental_scanner_id here, but the later active-selection fallback returns a keras_h5 skip result before the supplemental merge, so the allowed compressed payload is never inspected. Carry the allowed user-block scanner into that skip path and run it with the same bounded config.

AGENTS.md reference: AGENTS.md:L136-L136

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-18T01:47:29Z

+            if hdf5_tar_prefix_ownership in {"complete", "scan_limit"}:
+                supplemental_config = dict(config)
+                supplemental_config[TAR_SOURCE_SIZE_LIMIT_CONFIG_KEY] = hdf5_signature_offset


Bound primary TAR selection to the HDF5 user block

When --scanners tar/nemo makes the TAR-family scanner the primary result for an HDF5 file with a complete TAR user block, this supplemental-only source limit is never applied because the later merge is skipped once result.scanner_name already matches the user-block scanner. The primary TAR scan then reads into the HDF5 body and reports raw/compressed trailing data for a bounded benign prefix; apply the same HDF5 boundary before running TAR/NeMo as the selected primary scanner.

AGENTS.md reference: AGENTS.md:L136-L136

Useful? React with 👍 / 👎.

fix: stream large tar archive inspection

a2cd64e

github-code-quality Bot found potential problems Jun 11, 2026

View reviewed changes

Comment thread modelaudit/scanners/tar_scanner.py Fixed

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/scanners/tar_scanner.py Outdated

fix: report rejected tar links in inventory

4c0a5be

fix: bound tar stream routing budgets

a031188

github-code-quality Bot found potential problems Jun 11, 2026

View reviewed changes

Comment thread modelaudit/scanners/tar_scanner.py Fixed

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/scanners/tar_scanner.py Outdated

Comment thread modelaudit/utils/file/detection.py Outdated

fix: validate compressed tar routing

e0fb2da

github-code-quality Bot found potential problems Jun 11, 2026

View reviewed changes

Comment thread modelaudit/scanners/tar_scanner.py Fixed

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/utils/file/detection.py Outdated

Comment thread modelaudit/scanners/tar_scanner.py

Comment thread modelaudit/scanners/tar_scanner.py Outdated

fix: close tar stream files explicitly

2ad0802

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/utils/file/detection.py Outdated

Comment thread modelaudit/scanners/tar_scanner.py

Comment thread modelaudit/scanners/tar_scanner.py Outdated

fix: preserve bounded tar routing controls

0a9465d

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/scanners/tar_scanner.py

Comment thread modelaudit/utils/file/detection.py Outdated

Comment thread modelaudit/scanners/tar_scanner.py

fix: precheck tar member work budgets

6f2ad9c

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/utils/file/detection.py Outdated

fix: close tar streaming coverage gaps

e3160ef

chatgpt-codex-connector Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread modelaudit/core.py Outdated

mldangelo-oai requested review from a team and agold-oai and removed request for a team June 17, 2026 03:34

fix: preserve compressed hdf5 userblocks

2e6aafc

github-code-quality Bot found potential problems Jun 17, 2026

View reviewed changes

Comment thread modelaudit/scanners/compressed_scanner.py Fixed

Comment thread modelaudit/scanners/compressed_scanner.py Fixed

chatgpt-codex-connector Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread modelaudit/scanners/tar_scanner.py Outdated

fix: remove lz4 padding dead stores

4763b15

fix: preserve findings at tar ownership limits

67b7f48

Merge remote-tracking branch 'origin/main' into mdangelo/codex/hf-fp-…

36563ef

…t38-large-tar-streaming-20260610 # Conflicts: # modelaudit/core.py # modelaudit/utils/file/detection.py

chatgpt-codex-connector Bot reviewed Jun 17, 2026

View reviewed changes

mldangelo-oai added 4 commits June 17, 2026 15:39

test: make NeMo replacement metadata assertion portable

19859d3

fix: harden compressed wrapper ownership

b40460a

test: guard open-file replacement on Windows

01acf97

test: guard platform-specific archive mutations

f2b0c21

chatgpt-codex-connector Bot reviewed Jun 17, 2026

View reviewed changes

fix: preserve findings at archive scan limits

fd9361b

chatgpt-codex-connector Bot reviewed Jun 17, 2026

View reviewed changes

fix: preserve findings across archive failures

0d129e2

chatgpt-codex-connector Bot reviewed Jun 17, 2026

View reviewed changes

fix: preserve bounded archive coverage

9d858fb

chatgpt-codex-connector Bot reviewed Jun 18, 2026

View reviewed changes

fix: preserve tar overlap coverage

9a2bec2

chatgpt-codex-connector Bot reviewed Jun 18, 2026

View reviewed changes

fix: bound selected user-block scanners

b192e1b

		if not allow_incomplete_generic_tar_route and find_hdf5_signature_offset(path) is not None:
		return NEMO_ROUTING_INCONCLUSIVE_FORMAT

		except (EOFError, OSError, tarfile.TarError, ValueError):
		return "inconclusive"

		elif hdf5_tar_prefix_ownership != "embedded_member":
		supplemental_ownership_inconclusive = True

		if validated_hdf5_nemo_overlap:
		result.merge(_make_incomplete_nemo_routing_result(path))

Conversation

mldangelo-oai commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Final exact-head repairs

Scope note

Validation

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmarks

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

PR #1665 independent re-review

Findings

1. High — compressed-TAR limits still run after an unbounded routing traversal

2. High — PAX/GNU extension metadata is still materialized before any size check

3. High — max_total_size remains an after-the-fact diagnostic and resets across nested archives

Prior-blocker reconciliation

Fixed — rejected-link inventory state

Exact review snapshot

Independent validation

Live CI and review state

Merge disposition

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Summary

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

mldangelo-oai commented Jun 11, 2026 •

edited

Loading

github-actions Bot commented Jun 11, 2026 •

edited

Loading

3. High — `max_total_size` remains an after-the-fact diagnostic and resets across nested archives