fix(safetensors): validate zero-based shards and bind retained inputs by mldangelo-oai · Pull Request #1668 · promptfoo/modelaudit

mldangelo-oai · 2026-06-11T05:16:51Z

What and why

Make SafeTensors shard coverage honor validated *.safetensors.index.json inventories, including legitimate zero-based families such as model-00000-of-00001.safetensors, without weakening one-based coverage, scanner-selection boundaries, cache integrity, or fail-closed behavior.

Filename-only inference treated every family as one-based. This change carries bounded index authority through local, explicit, streaming, and Hugging Face scans and refuses to certify coverage when that authority or the retained input namespace cannot be reproduced at the terminal scan boundary.

Release and base integration

Exact PR head: 914625043473ccce96a0b84909251ab846504774.
Current base and merge base: main@3cf924f2cad4345570ede7a0aba785604f610065.
8245a965 additively merged the live PR head e3c76f7d, preserving its deterministic Windows fixture update.
5eb797a9 additively merged current origin/main, including coordinator release gate ci: shard Python coverage across runners #1682, disjoint Docker digest PR chore(deps): update python:3.12-slim docker digest to d764629 #1686, and subsequent Nightly CI changes.
21ca415d closes retained-input, companion-namespace, OCI accounting, ONNX alias, and Hugging Face routing gaps found by exact-head review.
534f36e9 removes attacker-controlled path and exception text from hash-failure logs after exact-head CodeQL identified a clear-text secret flow.
398b2b36 keeps Windows ancestor handles rename-denying but directory-write-compatible and stops unrelated sibling/cache creation from invalidating ordinary directory identity.
3e0ef439 retains trusted ancestor index authority when multiple complete same-shape families are selected, requests a native Windows delete-access directory guard, and distinguishes stable directory identity from ordinary child metadata changes.
20821c1e binds streamed SafeTensors shard and index reads to the retained local-source descriptor tree, including symlink targets; keeps every hash-failure log generic; and isolates the explicit-authority regression from unrelated concurrent test directories.
ae8ce191 rejects a retained symlink target descriptor unless it matches the initially captured namespace generation, closing a pre-open replace-and-restore window.
5c1417c2 captures trusted Hugging Face snapshot blob generations before generator execution and binds every retained local-stream input through the descriptor tree, including scan-and-delete aliases.
a33b912b evaluates common ancestor indexes against every selected same-shape family while isolating speculative broad-directory failures; it also makes the authority fixtures independent of the process umask.
5f6ef367 shares one bounded index-inspection budget across selected same-shape explicit families, constrains speculative ancestor authority to their deterministic common scope, and fails closed when the aggregate listing cap is exhausted.
1c7333a1 binds Windows reparse receipts to stable object identity, attributes, and tag instead of volatile timestamps while preserving POSIX symlink A-B-A detection; it also makes late-checkpoint routing order deterministic across retained and staged path aliases.
b8686007 shares aggregate index-discovery budgets across every explicit family group, retains receipt-only POSIX lexical chains, binds direct Hugging Face downloads before dispatch, and makes scan-and-delete cleanup reject a replaced pathname generation.
2ee9a19a moves scan-and-delete candidates into a private same-filesystem tombstone before generation verification, isolates speculative index failures per probe while retaining one aggregate budget, and keeps Windows dispatch receipts bound to the resolved object protected by retained handles.
a4051a1b validates each Windows source entry immediately around handle acquisition, keeps reparse lexical receipts on stable structural fields while target receipts and handles own object identity, and treats a denied ancestor rename as successful guard enforcement.
6a437d6a restores quarantined replacement generations without overwriting recreated paths, limits failed-download cleanup to invocation-owned Hugging Face staging, and makes the dispatch-deadline regression deterministic.
91462504 moves index deletion into retained private same-filesystem quarantine and evaluates descriptor-bound cache eligibility against the guarded logical path, preventing post-validation tombstone substitution and stale shard-family cache reuse.
No rebase, amend, reset, force-push, direct-main write, or history rewrite was used.

Security and correctness behavior

Support validated zero- and one-based SafeTensors inventories, exact declared-member identity, nested targets, arbitrary custom stems, and relevant prefixed indexes.
Bind local authority to index path, base, content fingerprint, observed generation, and terminally refreshed target identity. Index, shard, target, ancestor, and cleanup-time A-B-A changes fail closed.
Keep governing trusted ancestor indexes authoritative even when another complete same-stem/same-total family is selected; a deeper name-complete family cannot suppress conflicting or incomplete ancestor authority.
Retain POSIX source and companion descriptors, bind all retained streamed reads to the descriptor tree, require local symlink and Hugging Face blob targets to match their initial namespace generations, monitor relevant namespaces, use private staging outside watched parents, and preserve unverified roots rather than deleting by name.
Retain Windows deny-delete ancestor/reparse guards and deny-write/delete file guards through terminal reconciliation. Cross-dispatch comparison binds the resolved object rather than volatile lexical metadata; ordinary directory child changes do not masquerade as replacement, while final directory rename still requires delete access and is denied by the retained handle.
Bound index paths, ancestor traversal, physical observations, per-index and aggregate reads, JSON structure, tensor assignments, alias identities, ONNX companions, and mutation-event draining.
Reject duplicate JSON keys, malformed/mixed/unsafe indexes, path escapes, identity-indeterminate aliases, exhausted limits, untrusted explicit-family authority, and incomplete terminal proofs.
Preserve exact scanner, extension, filename, and content-route policy across standard and source-native Hugging Face acquisition. Inactive routes cannot shadow active ZIP, HDF5, ONNX, or SafeTensors routing.
Count OCI context-only layers and ONNX external-data aliases exactly once for byte/hash budgets while preserving every logical alias context and fail-closed incomplete hashes.
Keep source-native SafeTensors retention bounds, precomputed-result ownership, remote order, and exact transfer accounting.
Hash failures emit only generic messages; sensitive path and exception values are never logged.
Runtime-only inspection state remains excluded from cache fingerprints. No dependency or lockfile change is present.

Current diff versus `main@3cf924f2`

Production and changelog: +11,986 / -914 across 17 files.
Tests: +17,351 / -2,573 across 20 files.
Total: +29,337 / -3,487 across 37 files, net +25,850.

The simplification review consolidated duplicated Hugging Face route-policy derivation into one shared helper. The final delta review kept descriptor binding and platform-specific guards explicit because collapsing those boundaries would obscure distinct fail-closed invariants; no unused compatibility layer or safely removable branch remains.

Validation at exact head `914625043473ccce96a0b84909251ab846504774`

Mandatory non-slow, non-integration suite: 21,276 passed, 921 skipped, 40 warnings in 17m31s.
Complete affected core/CLI/streaming/hash matrices: 1,591 passed, 84 skipped, 4 warnings in 2m05s.
Full advanced file-handler matrix: 226 passed, 3 skipped in 7.11s.
Full Hugging Face source matrix: 667 passed, 18 skipped in 24.81s.
Full cache-correctness matrix: 167 passed, 3 skipped in 11.68s.
Full streaming-scan matrix: 264 passed, 32 skipped in 23.17s.
Exact review-fix regressions cover per-probe failure isolation, private cleanup quarantine, stable symlink dispatch, Windows/POSIX receipt semantics, directory receipt identity, and denied ancestor substitution. The final Windows-focused set is 7 passed, 1 Windows-only skipped locally.
Retained-descriptor regressions: regular-file binding, receipt-only lexical binding, local symlink binding, direct Hugging Face blob binding, scan-and-delete generation binding, and pre-open A-B-A rejections pass.
CI-specific regressions from the prior heads are addressed without weakening source binding: platform tests no longer mutate process-wide os.name; Windows receipt comparison binds the resolved target; each lexical entry is checked immediately around handle acquisition; and stable reparse hashes exclude volatile file-index metadata. Native Windows and Python 3.10 exact-head CI remain the terminal proof.
Ruff lint clean; Ruff format check clean across 424 files.
Mypy: 479 source files clean.
git diff --check: clean.
Full patch SHA-256: a1378389c851f2b211c5f1c9dd51d2eba085b22bc0c80ccc529d1d555d976b94.
Exact tree: 88958012f6604cdfad4ec9f25af97d476d3fc6b3.
uv.lock is byte-identical at head and base: blob 9d4a0c584f2a0b3c622dde29113c96fdd0e6c786.
Dependency-manifest diff versus current main is empty.
The reviewed lineage's focused correctness, core/cache/CLI/handler/Hugging Face security, and simplification reviews are clean; fresh review requests below supersede them for this exact head.

Pinned real-artifact corroboration from the reviewed lineage remains unchanged:

openbmb/MiniCPM5-1B@4e9de7a0778dc1c362e983e6858f0e77542cbdca: zero-based indexed sparse-header probe exited 0 with complete coverage.
Hcompany/Holo-3.1-4B@b63fb9bacd968a20b31d0c704b14429c0aad3a84: two-shard sparse-header probe exited 0 with complete coverage.

Remaining merge gate

This worker must not merge. Fresh exact-head Codex correctness/security review and exact-head CI must finish green or skipped, every actionable review thread must be resolved, and base/head/mergeability must be refreshed against the same SHA before coordinator handoff.

The current Dependency Audit baseline is waiting on coordinated aiohttp #1698 and cryptography #1697 upgrades; this PR intentionally retains a byte-identical uv.lock.

Exact-head remediation at `5584839`

Closes retained-input ABA gaps for ordinary local/unrooted streams, Hugging Face repository roots, bookkeeping files, and recreated OpenVINO companions.
Makes cleanup generation-bound with private quarantine, mandatory content proof, replacement preservation, and explicit Hugging Face staging ownership.
Bounds shard discovery to 256 unique family members, directory membership enumeration, in-flight futures, and exceptional descriptor ownership; cap exhaustion fails closed.
Preserves zero- and one-based indexed authority, scanner-selection boundaries, Windows reparse guards, cache invalidation, and terminal incomplete-coverage semantics.
Incremental patch: production +644/-157; tests +666/-87; no dependency or lockfile changes.
Full mandatory local suite: 20,117 passed, 1,420 skipped, 40 warnings. Full changed-module review suite: 2,517 passed, 110 skipped. Core plus CLI: 1,260 passed, 50 skipped.
Ruff format/check are clean across 424 files; mypy is clean across 479 source files; git diff --check is clean.
Independent final local review found no open P0/P1/P2/P3. Native Windows and exact-head CI remain required.
Queue gate: predecessor build(deps): bump starlette from 1.0.1 to 1.3.1 #1696 is merged; test: lock Nightly asset coverage contracts #1693, fix(picklescan): safely filter inert URL metadata #1658, fix: harden large tokenizer JSON EOF ownership proof #1695, fix: harden streamed TAR and NeMo inspection #1665, fix(onnx): narrow bounded quantized-weight analysis #1670, and fix(routing): safely route verified legal sidecars #1671 remain open. Current main will be merged additively only after all remaining predecessors merge.
No rebase, amend, force-push, direct-main write, or history rewrite was used.

Exact-head timeout remediation at `b04f5c0`

Replaces ineffective post-completion future timeouts with per-start monotonic shard deadlines.
Uses daemon shard workers plus a process-wide four-slot semaphore: a permanently wedged scanner cannot block interpreter shutdown or cause per-request worker growth, and exhausted capacity fails closed after the shard timeout.
Preserves context variables, ordinary exception reporting, control-flow exception propagation, and O(worker) scheduling.
Validation: advanced-handler module 237 passed, 3 platform skips; focused timeout/concurrency paths 11 passed; Ruff, mypy, formatting, and diff-check clean.
Fresh independent review found no P0-P3 blockers. The prior timeout review thread is resolved; fresh exact-head GitHub review and CI are pending.
Queue gate remains unchanged: test: lock Nightly asset coverage contracts #1693, fix(picklescan): safely filter inert URL metadata #1658, fix: harden large tokenizer JSON EOF ownership proof #1695, fix: harden streamed TAR and NeMo inspection #1665, fix(onnx): narrow bounded quantized-weight analysis #1670, and fix(routing): safely route verified legal sidecars #1671 must merge before current main is integrated additively.

mldangelo-oai · 2026-06-11T05:17:00Z

@codex review

github-actions · 2026-06-11T05:19:01Z

Workflow run and artifacts

Performance Benchmarks

Compared 13 shared benchmarks with a regression threshold of 15%.
Status: 4 regressions, 0 improved, 9 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 4.130s -> 6.690s (+62.0%).

Top regressions:

tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan +816.5% (151.12ms -> 1.385s, warm-cache-rescan, release-candidate, size=547.3 KiB, files=32)
tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository +150.9% (561.45ms -> 1.409s, mixed-model-repository, release-candidate, size=547.3 KiB, files=32)
tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake +81.6% (133.73ms -> 242.80ms, suspicious-pickle-intake, suspicious-intake, size=183.8 KiB, files=4)

Workload	Benchmark	Target	Size	Files	Baseline	Current	Change	Status
`warm-cache-rescan`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan`	`release-candidate`	547.3 KiB	32	151.12ms	1.385s	+816.5%	regression
`mixed-model-repository`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository`	`release-candidate`	547.3 KiB	32	561.45ms	1.409s	+150.9%	regression
`suspicious-pickle-intake`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake`	`suspicious-intake`	183.8 KiB	4	133.73ms	242.80ms	+81.6%	regression
`duplicate-heavy-registry`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot`	`registry-snapshot`	915.2 KiB	13	485.62ms	826.09ms	+70.1%	regression
`single-checkpoint-preflight`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load`	`single_checkpoint.pkl`	183.0 KiB	1	88.42ms	99.91ms	+13.0%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64]`	`nested_base64`	98 B	1	276.8us	289.3us	+4.5%	stable
`padded-multi-stream-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload`	`multi_stream_padded`	4.1 KiB	1	344.5us	333.0us	-3.4%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw]`	`nested_raw`	78 B	1	274.0us	271.0us	-1.1%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex]`	`nested_hex`	130 B	1	299.0us	295.8us	-1.1%	stable
`rejected-basic-auth-candidates`	`tests/benchmarks/test_scan_benchmarks.py::test_rejected_basic_auth_candidates_scan_linearly`	`-`	371.1 KiB	1	2.484s	2.503s	+0.7%	stable
`direct-malicious-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload`	`malicious_reduce`	52 B	1	222.9us	223.7us	+0.3%	stable
`chunked-upload-stream`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream`	`chunked_stream`	278.2 KiB	1	113.89ms	113.52ms	-0.3%	stable
`clean-training-checkpoint`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint`	`safe_large`	278.2 KiB	1	109.84ms	109.98ms	+0.1%	stable

chatgpt-codex-connector · 2026-06-11T05:20:04Z

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T05:35:26Z

@codex review

chatgpt-codex-connector · 2026-06-11T05:39:06Z

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T06:05:27Z

Independent Review: promptfoo/modelaudit PR #1668

Disposition

BLOCK / DO NOT MERGE at head 058eeb66d5f93e4fc25a21507a3a84f4d4a75ce8.

The branch is conflict-free and exact-head CI is green, but three deterministic medium-severity defects remain in the new SafeTensors index/base handling. The exact pinned MiniCPM5 artifact also still exits 2 on the current macOS runner because the now-complete one-shard family enters the pre-existing descriptor-pinning path that macOS cannot support.

Exact review target

Field	Value
PR	`promptfoo/modelaudit#1668`
Title	`fix(safetensors): validate index-driven zero-based shards`
State	Open, non-draft
Head	`058eeb66d5f93e4fc25a21507a3a84f4d4a75ce8`
Base	`8d6c4864fe2ea833ceaef1b9803d225afb1e8d69` (`main`)
Merge base	`8d6c4864fe2ea833ceaef1b9803d225afb1e8d69`
Divergence	Head ahead 1, behind 0
Mergeability	`MERGEABLE`; `mergeStateStatus=BLOCKED` because review is required
Review decision	`REVIEW_REQUIRED`
Reviews	0
Review threads	0 total, 0 unresolved

origin/main did not need to be merged for analysis because the PR was already based on its exact current tip.

Findings

Severity	Finding	Confidence
Medium	Remote SafeTensors indexes can widen an explicit SafeTensors-only selection to foreign shard formats	98%
Medium	Local indexes with nested shard targets are ignored, allowing a missing indexed shard to scan cleanly	97%
Medium	Streaming accepts a mixed-base/unreferenced total-1 family and regresses from exit 2 to exit 0	98%
Low	The user-visible fix has no required `[Unreleased]` changelog entry	95%

1. Medium: Remote SafeTensors indexes can widen an explicit SafeTensors-only selection to foreign shard formats

Affected code: modelaudit/utils/sources/huggingface.py:887-908,924-927.

The remote validator calls the generic ShardedModelDetector.match_shard_filename() for every weight_map target. That helper recognizes PyTorch .bin, TensorFlow, Keras, and custom shard patterns as well as SafeTensors. The validator then appends every accepted target to model_files without reapplying the caller's scanner/extension policy.

Exact-head reproduction:

Repository listing: model.safetensors.index.json, model-00000-of-00001.safetensors, and pytorch_model-00001-of-00001.bin.
The SafeTensors index mapped its tensor to the .bin file.
_select_streamable_hf_files(..., scannable_extensions={'.safetensors'}, scannable_scanner_ids={'safetensors'}) returned all three files, including pytorch_model-00001-of-00001.bin.
The lower-level validator likewise changed the input selection [index, safetensors] into [index, safetensors, foreign .bin].

Impact: a malformed index in an untrusted Hub repository can force a nominally SafeTensors-only scan to download an arbitrarily large foreign shard. max_size defaults to unlimited, and the owning scanner remains suppressed by the explicit scanner selection, so this is both a resource-bound regression and an admitted-but-unanalysed artifact path.

Required fix: validate targets with SAFETENSORS_SHARD_PATTERN specifically, not the generic shard matcher, before extending model_files. Add snapshot and streaming tests proving .bin, .pt, .h5, and other foreign target formats fail closed before download under both default and explicit SafeTensors selection.

Validator result: VALIDATED, confidence 98%, introduced by head 058eeb66d5f93e4fc25a21507a3a84f4d4a75ce8.

2. Medium: Local indexes with nested shard targets are ignored, allowing a missing indexed shard to scan cleanly

Affected code: modelaudit/utils/file/handlers.py:560-583,862-901.

_safe_index_target_path() explicitly permits relative subdirectory targets, but _load_safetensors_index_inventory() only looks for model.safetensors.index.json adjacent to the shard being scanned. When the index is in a parent directory and points into a child shard directory, no authoritative inventory is loaded; the detector silently falls back to filename-only base inference.

Exact-head reproduction:

root/model.safetensors.index.json declared:
- shards/model-00000-of-00002.safetensors
- shards/model-00001-of-00002.safetensors
Disk contents deliberately omitted indexed shard 0 and instead contained:
- shards/model-00001-of-00002.safetensors
- unreferenced shards/model-00002-of-00002.safetensors
ShardedModelDetector.detect_shards() reported shard_index_base='one', total_shards=2, no safetensors_index_path, no missing shards, and no unexpected shards.
A full directory scan, with only the known macOS pin transport replaced by an identity-preserving test context, returned success=true, has_errors=false, and exit 0 with no incomplete-coverage reason.

Impact: an indexed local artifact can omit a declared shard and substitute an unreferenced same-family shard while ModelAudit reports complete coverage. This contradicts the PR's claim that validated index inventories are authoritative and that missing/unreferenced evidence fails closed.

Required fix: discover and validate governing index files at directory-discovery scope, map each safe relative target to the indexed family, and carry that inventory into grouped shard scanning. Add direct-directory tests for nested zero- and one-based targets, missing targets, unreferenced substitutes, traversal, symlink aliases, and index mutation.

Validator result: VALIDATED, confidence 97%, changed-surface omission at head 058eeb66d5f93e4fc25a21507a3a84f4d4a75ce8.

3. Medium: Streaming accepts a mixed-base/unreferenced total-1 family and regresses from exit 2 to exit 0

Affected code: modelaudit/utils/file/handlers.py:529-533; modelaudit/core.py:748-757,882-891; modelaudit/utils/sources/huggingface.py:877-927.

Base inference is performed independently for each streamed file. For expected_total=1, shard 0 is individually classified as a complete zero-based family and shard 1 as a complete one-based family. Neither receives a missing-shard placeholder. Later reconciliation notices that the combined family is not complete, but _reconcile_cross_directory_shard_coverage() only removes existing placeholders; when complete_sources is empty it returns without adding an ambiguous/unexpected-family failure.

The remote index validator compounds this by checking that referenced targets exist but not rejecting unreferenced same-pattern/same-total siblings.

Exact-head reproduction:

Streamed, with delete-after-scan semantics, model-00000-of-00001.safetensors followed by model-00001-of-00001.safetensors under one trusted family scope.
A controlled platform-only pin shim preserved the original path and changed no coverage logic.
Head result: success=true, has_errors=false, exit 0, two files scanned, and no incomplete reason.
origin/main@8d6c4864fe2ea833ceaef1b9803d225afb1e8d69 on the same harness returned success=false, exit 2, reason missing_model_shards.
A valid index pointing only to shard 0 plus unreferenced shard 1 was accepted by _validate_remote_safetensors_indexes(); both remained selected.

Impact: the PR introduces a fail-open for mixed-base streamed inventories and does not enforce its stated rejection of unreferenced same-family shards. The same ambiguity applies to larger no-index families where {0..N-1} can also represent a one-based family missing shard N plus an injected shard 0.

Required fix: reject unreferenced same-pattern/same-total siblings while validating a remote index, carry authoritative base/inventory metadata into streaming reconciliation, and emit an explicit incomplete result when combined validated targets have invalid cardinality or mixed base instead of merely declining to remove prior errors. Add head-vs-base regression tests for total 1 and larger ambiguous families.

Validator result: VALIDATED, confidence 98%, introduced by head 058eeb66d5f93e4fc25a21507a3a84f4d4a75ce8.

4. Low: The user-visible fix has no required `[Unreleased]` changelog entry

Affected policy: AGENTS.md:80; changed-file list for PR #1668.

The repository rule says, "Keep CHANGELOG entries in [Unreleased] when adding user-visible changes." This PR changes exit status and shard coverage for real models but modifies no changelog.

Required fix: add a concise [Unreleased] fix entry covering index-driven zero-based SafeTensors shard completeness.

Validator result: VALIDATED, confidence 95%, exact root policy applies to all seven changed files.

Merge-blocking acceptance gap: exact pinned MiniCPM5 still fails on macOS

This is an operational acceptance failure rather than a newly introduced regression; the underlying descriptor-pinning limitation predates the PR. It is still material because the PR's central post-fix claim does not hold on the current macOS environment.

Exact immutable artifact evidence downloaded from openbmb/MiniCPM5-1B@4e9de7a0778dc1c362e983e6858f0e77542cbdca:

Evidence	Observed
Index bytes	`18004`
Index SHA-256	`162add042e75abc3d571c4a8679523fa4f1ffc55d1fea25fc6658a19d6e957ee`
Real shard range bytes	`25248`
Header length	`25240`
Header payload SHA-256	`bca5391cbdf4bc5ae84d62b82d46ec3a5030a77d171faed609b723a1f4a2efb0`
Prefix + header SHA-256	`ecbdf640e2ce4cf283b3d6d7c758d7cf71cfbbd30c3aea04601f36f8999c5675`
Sparse logical shard size	`2161290912` bytes
Sparse allocated blocks	56 filesystem blocks

Exact-head CLI result on that index plus real header in the correctly sized sparse shard:

Exit 2.
success=false, has_errors=true, files_scanned=2.
Failure: Shard Scan Pinning.
Reason: shard_pin_unavailable / descriptor-bound shard pinning unavailable.

Relevant paths: modelaudit/utils/file/handlers.py:192-222,862-901,1621-1642.

The focused test slice showed the same limitation: 35 passed, 11 failed, and 4 skipped on macOS; every failure stopped at shard pinning before the intended scanner assertion. The repo already explicitly skips several descriptor-path tests on Darwin, but the new ordinary zero-/one-based shard tests are not platform-gated and therefore fail locally. The PR either needs a secure single-shard/macOS path or must explicitly scope and test the limitation rather than claiming universal exit 0 for the motivating artifact.

Validation evidence

Live GitHub state

Exact head rechecked immediately before report generation: 058eeb66d5f93e4fc25a21507a3a84f4d4a75ce8.
Exact base: 8d6c4864fe2ea833ceaef1b9803d225afb1e8d69.
Checks: 23 passing, 6 skipped, 0 pending/failing.
CI umbrella jobs, Python 3.10/3.13, Windows, lint/format, type check, build/package, dependency audit, CodeQL, Docker, docs, title, and benchmarks all report success.
Benchmark bot comment reports one threshold regression: duplicate registry snapshot +16.0%; aggregate shared-benchmark median was +7.8%. The regressed workload contains no SafeTensors files, so it was retained as non-blocking noise rather than promoted to a code finding.
Reviews: 0. Review threads: 0 total and 0 unresolved.

Static and focused runtime checks

git diff --check origin/main...HEAD: passed.
Ruff on all seven changed files: passed.
Mypy on modelaudit/core.py, modelaudit/utils/file/handlers.py, and modelaudit/utils/sources/huggingface.py: passed.
Adding modelaudit/cli.py to the local mypy invocation exposed three pre-existing Darwin stub errors for os.listxattr, os.setxattr, and os.getxattr; exact-head CI's Type Check is green.
Platform-independent focused slice: 5 passed, covering snapshot/streaming zero-based selection, missing remote target rejection, and huge declared shard-total bounding.
Six zero-/one-based streaming and malicious-finding assertions passed when only the unavailable Darwin descriptor transport was replaced; malicious findings remained exit 1.
A 10 MiB + 1 sparse local index failed closed as safetensors index exceeds bounded parse limit, with unvalidated_shard_count=1.
The exact pinned MiniCPM5 index/header hashes matched every immutable value stated in the PR body.

Focused test command with platform result

The requested focused PR slice completed as:

35 passed, 11 failed, 4 skipped

The 11 failures were all the known macOS shard_pin_unavailable path, including the newly parameterized zero-/one-based tests. They are not evidence that Linux CI is stale: exact-head GitHub CI is green. They are evidence that the motivating behavior is not portable to the current macOS runtime.

Review-lane conclusions

Gatekeeper: review required; open, non-trivial, runtime/security change.
Policy paths: only root AGENTS.md applies; one changelog violation validated.
Diff-only and contextual bug review: three runtime defects validated above; no additional compile/import defect.
Review-thread reconciliation: no threads or reviews exist; no stale feedback to discount.
History/intent: one commit, no linked issue; the change targets the filename-only SafeTensors coverage gap using the pinned MiniCPM5 model.
Code-comment verification: no contradictory changed inline comment survived validation. The PR body security claims are broader than the implementation in findings 2 and 3.
Behavior impact: explicit scanner selection can be widened to foreign downloads; local and streamed coverage can report clean for inconsistent inventories.
Repo-surface/cross-repo: no rename, generated contract, or second-repository change.
Branch/CI: current with main; no analysis merge needed; exact-head CI green.

Required disposition before rereview

Restrict remote index targets to the exact SafeTensors shard pattern before selection expansion.
Carry governing nested index inventories into local grouped shard scans.
Reject unreferenced same-family remote siblings and emit explicit incomplete outcomes for mixed/invalid streamed families.
Add macOS-aware acceptance coverage for the pinned one-shard MiniCPM5 case or explicitly document and enforce the platform limitation.
Add the required [Unreleased] changelog entry.
Rerun exact real-artifact QA, focused adversarial tests, full fast CI, and a new macOS shard lane at the final head.

mldangelo-oai · 2026-06-11T06:08:31Z

New pinned all-format QA on main 8d6c4864:

Hcompany/Holo-3.1-4B@b63fb9bacd968a20b31d0c704b14429c0aad3a84 completed all 15 files / 10.37 GB, but the two-shard SafeTensors family was still marked incomplete twice: shard 1 reported shard 2 missing, and shard 2 reported shard 1 missing. The repository also contains its model.safetensors.index.json.

Please add this exact complete two-shard/index family to current-head QA. Repository/index reconciliation must happen before per-file missing-family outcomes are aggregated; a complete family must not exit 2.

Audit: modelaudit-hf-scan-swarm/outputs/hf-audit-r259-20260611.md.

mldangelo-oai · 2026-06-11T07:46:19Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9891fd530d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T08:35:48Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c0759322f0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T09:29:37Z

@codex review

chatgpt-codex-connector · 2026-06-11T09:32:53Z

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-13T08:53:26Z

Fleet sequencing note: keep the goal active and continue the full review, simplification, conflict resolution, validation, title/body cleanup, and exact-head CI work. Please hold only the final merge until Nightly repair PR #1679 reports MERGED. Immediately afterward, fetch the new origin/main, merge it additively into this published branch (no rebase or force-push), re-run exact-new-head technical gates, and then use the normal squash/admin-review-only merge path. This coordination hold should not pause any other work.

mldangelo-oai · 2026-06-13T09:06:27Z

Review closeout at current head 498ff98bd83d38f2db152d2e2d01f6fe1bd8f3b8:

The earlier foreign-target, nested-index, mixed-base/unreferenced-family, changelog, and one-shard portability blockers are addressed. The takeover also fixed descriptor/source/index mutation races, cache-enabled single-family reconciliation, SafeTensors/pickle overlap peer coverage, explicit Hugging Face scanner selection, and bounded remote index validation.
Pinned openbmb/MiniCPM5-1B@4e9de7a0778dc1c362e983e6858f0e77542cbdca sparse-header QA now exits 0: success=true, 2 files, 11/11 checks passed, no coverage reason.
The requested Hcompany/Holo-3.1-4B@b63fb9bacd968a20b31d0c704b14429c0aad3a84 complete two-shard/index QA now exits 0: success=true, 3 files, 10,350,143,826 bytes, 23/23 checks passed, no coverage error.
Affected tests: 679 passed, 35 skipped. Full fast lane: 19,331 passed, 1,344 skipped, 39 warnings. Repository-wide Ruff and mypy are clean.

The branch contains additive merges from main; no history was rebased or force-pushed. Final merge remains intentionally held until #1679 reports merged, per the fleet sequencing note. Current-head CI is now the next gate.

mldangelo-oai · 2026-06-13T09:07:37Z

Fleet release: Nightly repair PR #1679 is MERGED as b83dff38026574e255756613d46f65f0010a11b6 (2026-06-13 09:07:12Z). Fetch origin/main now and merge that exact current main additively into this published branch; do not rebase or force-push. Reconcile any overlap according to current behavior, re-run the final review/simplification/thread/metadata gates, push, and require exact-new-head CI before squash merge. If main advances again before your merge, repeat the normal merge-from-main and exact-head gate. The prior hold is released; continue the goal through verified merge and post-merge main health.

mldangelo-oai · 2026-06-13T09:09:42Z

Fleet sequencing complete: #1679 merged as b83dff38026574e255756613d46f65f0010a11b6, and that exact main tip is now included through additive merge commit bcb6b826c2710e6eb49a0b1e621baacdac1fb884.

Post-sync local gates are clean: 703 passed / 35 optional or real-network skips across the affected SafeTensors, Hugging Face, streaming, handler, and #1679 six.moves surfaces; repository-wide Ruff format/check and full mypy also pass. The final gate is exact-head CI on bcb6b826.

mldangelo-oai · 2026-06-13T09:16:30Z

Serial landing update: #1673 is the next release gate because it is a one-test fail-closed regression with a fully green pre-sync run. Continue your current-main sync, review, simplification, and CI work, but hold the final merge until #1673 reports MERGED. This avoids making #1673 restart against an unrelated large feature branch. Keep the goal active; once #1673 lands, merge the newest main again if needed and complete your exact-head landing cycle.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1c7333a1c3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Share bounded SafeTensors discovery state across explicit families while isolating speculative failures. Bind direct Hugging Face downloads and receipt-only POSIX paths to retained generations, and make streaming cleanup reject generation swaps.

mldangelo-oai · 2026-06-16T07:13:51Z

@codex review

Please review exact head b868600785ca76ea8049a7c83e9edc5624871843 for correctness, POSIX and Windows behavior, regressions, simplification opportunities, and merge blockers. Focus on cross-drive ancestor authority, aggregate explicit-family inspection budgets with isolated speculative failures, direct Hugging Face acquisition binding, receipt-only POSIX lexical retention, generation-bound scan-and-delete cleanup, and the reviewed SafeTensors behavior. This supersedes all earlier requests; treat every other SHA as stale.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b868600785

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Quarantine streamed cleanup candidates before generation verification, isolate speculative SafeTensors index failures per probe while sharing aggregate budgets, and keep Windows receipt matching bound to the guarded resolved target.

mldangelo-oai · 2026-06-16T08:25:35Z

@codex review

Please review exact head 2ee9a19a2110ff03f7092639e56b2557f483bf65 for correctness, POSIX and Windows behavior, regressions, simplification opportunities, LOC growth, and merge blockers. Focus on private generation-verified scan-and-delete quarantine, per-probe failure isolation with shared aggregate index budgets, Windows resolved-target receipts plus retained lexical handles, and the full surviving SafeTensors/Hugging Face retained-input diff. This supersedes all earlier requests; treat every other SHA as stale.

chatgpt-codex-connector · 2026-06-16T08:32:17Z

Codex Review: Didn't find any major issues. Chef's kiss.

Reviewed commit: 2ee9a19a21

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-16T08:34:45Z

@codex review for security vulnerabilities

Please perform a security-focused review of exact head 2ee9a19a2110ff03f7092639e56b2557f483bf65. Review the full surviving diff, especially private tombstone cleanup ownership, replacement preservation, retained source and companion TOCTOU/A-B-A boundaries, per-probe failure isolation with shared aggregate resource limits, Windows resolved-target and lexical-handle binding, SafeTensors index authority, Hugging Face routing/acquisition, cache integrity, fail-open behavior, sensitive logging, resource bounds, simplification, and LOC growth. This supersedes all earlier security requests; treat every other SHA as stale.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ee9a19a21

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ee9a19a21

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Validate Windows guard entries immediately around handle acquisition, bind reparse lexical identity to stable structure while resolved receipts and handles own object identity, and treat access-denied ancestor substitution as a successful guard outcome.

mldangelo-oai · 2026-06-16T09:20:58Z

@codex review

Please review exact head a4051a1b1a63457b416378144c42a2d83a31f9f5 for correctness, native Windows and POSIX behavior, regressions, simplification opportunities, LOC growth, and merge blockers. Focus on per-entry Windows handle-acquisition validation, stable reparse receipt fields versus resolved-target identity, denied ancestor substitution, private generation-verified cleanup, per-probe index failure isolation, and the full surviving SafeTensors/Hugging Face diff. This supersedes all earlier requests; treat every other SHA as stale.

chatgpt-codex-connector · 2026-06-16T09:27:59Z

Codex Review: Didn't find any major issues. 🚀

Reviewed commit: a4051a1b1a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Restore quarantined index and streamed-source generations without overwriting recreated paths. Keep caller-supplied Hugging Face cache directories outside invocation-owned cleanup, and make the deadline regression deterministic.

mldangelo-oai · 2026-06-16T09:58:44Z

@codex review for security vulnerabilities

Please perform a security-focused review of exact head 6a437d6a05e17182e13f54ac23c6656eac1df77e. Review the full surviving diff, especially unverified cleanup replacement preservation, atomic no-overwrite restoration, invocation-owned versus caller-owned Hugging Face cleanup, retained source and companion TOCTOU/A-B-A boundaries, shared aggregate resource limits, Windows resolved-target and lexical-handle binding, SafeTensors index authority, routing/acquisition, cache integrity, fail-open behavior, sensitive logging, resource bounds, simplification, and LOC growth. This supersedes all earlier security requests; treat every other SHA as stale.

mldangelo-oai · 2026-06-16T09:58:45Z

@codex review

Please review exact head 6a437d6a05e17182e13f54ac23c6656eac1df77e for correctness, native Windows and POSIX behavior, regressions, simplification opportunities, LOC growth, and merge blockers. Focus on no-overwrite restoration of quarantined index and source generations, caller-supplied Hugging Face cache ownership, deterministic deadline forwarding, the previously reviewed Windows receipt fixes, and the full surviving SafeTensors/Hugging Face diff. This supersedes all earlier requests; treat every other SHA as stale.

chatgpt-codex-connector · 2026-06-16T10:12:51Z

Codex Review: Didn't find any major issues. Swish!

Reviewed commit: 6a437d6a05

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

https://github.com/promptfoo/modelaudit/blob/6a437d6a05e17182e13f54ac23c6656eac1df77e/core.py#L10722
Include ctime when deleting index candidates

When a SafeTensors index candidate is rewritten in place after classification but before cleanup, dev/ino/mode/size/mtime can still match because mtime can be restored, while ctime changes. This post-rename proof compares only the truncated stat[:5] identity, so cleanup can unlink that newer index generation instead of preserving it for terminal validation; compare the full captured identity, including st_ctime_ns, before deleting the tombstone.

https://github.com/promptfoo/modelaudit/blob/6a437d6a05e17182e13f54ac23c6656eac1df77e/core.py#L11073-L11074
Include ctime when deleting streamed sources

When a streamed source is rewritten in place between the pre-cleanup identity check and the quarantine rename, the inode can keep the same dev/ino/size/mode and restored mtime while ctime changes. Comparing only expected_cleanup_identity.stat[:5] lets cleanup treat that replacement as the scanned generation and unlink it, instead of restoring or preserving it as a boundary failure; compare the full captured stat tuple before deleting the tombstone.

modelaudit/modelaudit/utils/sources/huggingface.py

Lines 6502 to 6503 in 6a437d6

    
           if not allow_content_probes or file_size <= _HF_CONTENT_SNIFF_BYTES: 
        
               minimum_safetensors_read_bytes += file_size

Stop double-counting small SafeTensors probes

For streaming Hugging Face scans with overlap scanners active and a tight --max-download-bytes, a declared .safetensors whose size is at most _HF_CONTENT_SNIFF_BYTES is already charged for the full prefix read by min(file_size, _HF_CONTENT_SNIFF_BYTES). This branch adds the whole file size a second time before the actual header scan, so minimum_selected_bytes can exceed the limit even when the real prefix plus bounded header reads would fit; only reserve the additional header bytes that will actually be fetched.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Keep index tombstones in retained private same-filesystem directories so validated cleanup cannot unlink a swapped generation. Evaluate descriptor-bound cache eligibility using the guarded logical path, including shard-family bypasses.

mldangelo-oai · 2026-06-16T10:43:46Z

@codex review for security vulnerabilities

Please perform a security-focused review of exact head 914625043473ccce96a0b84909251ab846504774. Review the full surviving diff, especially private same-filesystem index quarantine and post-validation substitution, no-overwrite restoration, descriptor-bound logical cache eligibility and stale shard-family reuse, invocation-owned versus caller-owned Hugging Face cleanup, retained source and companion TOCTOU/A-B-A boundaries, aggregate resource limits, Windows target and lexical-handle binding, SafeTensors index authority, routing/acquisition, cache integrity, fail-open behavior, sensitive logging, resource bounds, simplification, and LOC growth. This supersedes all earlier security requests; treat every other SHA as stale.

mldangelo-oai · 2026-06-16T10:43:46Z

@codex review

Please review exact head 914625043473ccce96a0b84909251ab846504774 for correctness, native Windows and POSIX behavior, regressions, simplification opportunities, LOC growth, and merge blockers. Focus on retained private index quarantine, symlink-target resolution after quarantine, no-overwrite restoration, descriptor-bound logical cache policy, caller-supplied Hugging Face cache ownership, the reviewed Windows receipt fixes, and the full surviving SafeTensors/Hugging Face diff. This supersedes all earlier requests; treat every other SHA as stale.

chatgpt-codex-connector

💡 Codex Review

modelaudit/modelaudit/utils/sources/huggingface.py

Lines 5503 to 5505 in 9146250

    
           if not resolved_cache_dir.name.startswith("modelaudit_hf_"): 
        
               return False 
        
           return _is_within_directory(download_path, file_path)

Gate HF cleanup on ownership, not cache-dir name

When streaming with a caller-supplied persistent cache_dir whose final component starts with modelaudit_hf_, this treats files under the deterministic Hugging Face download_path as invocation-owned and later unlinks context-only ONNX sidecars. If that sidecar already existed in the caller's cache/local dir, the scan deletes caller-owned data; cleanup should be tied to an explicit temp/staging ownership marker rather than the cache directory name.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-16T10:54:44Z

Codex Review: Didn't find any major issues. Breezy!

Reviewed commit: 9146250434

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-16T21:25:38Z

@codex review exact head 5584839

Please perform a fresh full correctness and security review of this exact head. Revalidate retained local/unrooted/Hugging Face generation binding, OpenVINO consumed-companion identity, SafeTensors cleanup quarantine and content proof, zero/one-based index authority, bounded shard discovery/futures/descriptors, Windows reparse semantics, cache integrity, and fail-closed terminal coverage. Treat every earlier SHA as stale.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 55848391d7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-16T21:52:10Z

@codex review exact head b04f5c0717979282da58d7f9dc46160597b64d35

mldangelo-oai · 2026-06-16T21:52:19Z

@codex review exact head b04f5c0 (correction: this is the verified pushed SHA)

chatgpt-codex-connector · 2026-06-16T22:00:07Z

Codex Review: Didn't find any major issues. Keep them coming!

Reviewed commit: b04f5c07ab

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

fix(safetensors): validate index-driven zero-based shards

058eeb6

fix: validate safetensors shard indexes

9891fd5

github-code-quality Bot found potential problems Jun 11, 2026

View reviewed changes

Comment thread tests/test_streaming_scan.py Fixed

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/utils/file/handlers.py Outdated

Comment thread modelaudit/utils/sources/huggingface.py

fix: preserve safetensors selection boundaries

c075932

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/utils/sources/huggingface.py Outdated

Comment thread modelaudit/utils/file/handlers.py Outdated

fix: validate nested safetensors indexes

1a3eec8

mldangelo-oai requested a review from mldangelo June 11, 2026 09:48

mldangelo-oai enabled auto-merge (squash) June 11, 2026 09:59

mldangelo-oai disabled auto-merge June 11, 2026 13:18

mldangelo-oai added 3 commits June 13, 2026 08:59

fix(safetensors): harden index-driven shard validation

e466e30

Merge remote-tracking branch 'origin/main' into review/pr-1668-takeover

a02090f

test(huggingface): keep deadline mock type-safe

498ff98

Merge remote-tracking branch 'origin/main' into review/pr-1668-takeover

bcb6b82

chatgpt-codex-connector Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread modelaudit/cli.py Outdated

Comment thread modelaudit/core.py Outdated

chatgpt-codex-connector Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread modelaudit/core.py

Comment thread modelaudit/cli.py Outdated

fix(security): bind cleanup and isolate index probes

2ee9a19

Quarantine streamed cleanup candidates before generation verification, isolate speculative SafeTensors index failures per probe while sharing aggregate budgets, and keep Windows receipt matching bound to the guarded resolved target.

chatgpt-codex-connector Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread modelaudit/core.py

Comment thread modelaudit/utils/sources/huggingface.py

Comment thread modelaudit/core.py

chatgpt-codex-connector Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread modelaudit/core.py

Comment thread modelaudit/utils/sources/huggingface.py

Comment thread modelaudit/core.py

fix(security): preserve unverified cleanup replacements

6a437d6

Restore quarantined index and streamed-source generations without overwriting recreated paths. Keep caller-supplied Hugging Face cache directories outside invocation-owned cleanup, and make the deadline regression deterministic.

chatgpt-codex-connector Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread modelaudit/utils/helpers/cache_decorator.py

fix(security): bind index cleanup and cache policy

9146250

Keep index tombstones in retained private same-filesystem directories so validated cleanup cannot unlink a swapped generation. Evaluate descriptor-bound cache eligibility using the guarded logical path, including shard-family bypasses.

chatgpt-codex-connector Bot reviewed Jun 16, 2026

View reviewed changes

fix(security): close retained input race gaps

5584839

chatgpt-codex-connector Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread modelaudit/utils/file/handlers.py Outdated

fix(shards): bound timed-out scan workers

b04f5c0

	if not allow_content_probes or file_size <= _HF_CONTENT_SNIFF_BYTES:
	minimum_safetensors_read_bytes += file_size

	if not resolved_cache_dir.name.startswith("modelaudit_hf_"):
	return False
	return _is_within_directory(download_path, file_path)

Conversation

mldangelo-oai commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What and why

Release and base integration

Security and correctness behavior

Current diff versus main@3cf924f2

Validation at exact head 914625043473ccce96a0b84909251ab846504774

Remaining merge gate

Exact-head remediation at 5584839

Exact-head timeout remediation at b04f5c0

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmarks

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Independent Review: promptfoo/modelaudit PR #1668

Disposition

Exact review target

Findings

1. Medium: Remote SafeTensors indexes can widen an explicit SafeTensors-only selection to foreign shard formats

2. Medium: Local indexes with nested shard targets are ignored, allowing a missing indexed shard to scan cleanly

3. Medium: Streaming accepts a mixed-base/unreferenced total-1 family and regresses from exit 2 to exit 0

4. Low: The user-visible fix has no required [Unreleased] changelog entry

Merge-blocking acceptance gap: exact pinned MiniCPM5 still fails on macOS

Validation evidence

Live GitHub state

Static and focused runtime checks

Focused test command with platform result

Review-lane conclusions

Required disposition before rereview

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 13, 2026

Uh oh!

mldangelo-oai commented Jun 13, 2026

Uh oh!

mldangelo-oai commented Jun 13, 2026

Uh oh!

mldangelo-oai commented Jun 13, 2026

Uh oh!

mldangelo-oai commented Jun 13, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026 •

edited

Loading

Current diff versus `main@3cf924f2`

Validation at exact head `914625043473ccce96a0b84909251ab846504774`

Exact-head remediation at `5584839`

Exact-head timeout remediation at `b04f5c0`

github-actions Bot commented Jun 11, 2026 •

edited

Loading

4. Low: The user-visible fix has no required `[Unreleased]` changelog entry

mldangelo-oai commented Jun 16, 2026 •

edited

Loading