fix(onnx): narrow bounded quantized-weight analysis by mldangelo-oai · Pull Request #1670 · promptfoo/modelaudit

mldangelo-oai · 2026-06-11T05:19:55Z

Summary

Replace the oversized quantized ONNX test expansion with a compact parameterized adversarial suite.
Analyze bounded QDQ/QLinear/integer weight lineage with shared retained-array accounting, bounded graph/scale traversal, typed decoder working-set reservation, and fail-closed incomplete proofs.
Preserve byte-identical ONNX anomaly/check clustering and role-bound ONNX external-data package identity, including streamed/directory identity races and unrelated duplicate sidecar bytes.

Preserved security cases

terminal descendants, graph-output descendants, canonical/custom GELU and local-op boundaries;
generated-value/lineage laundering and factor-amplification fail-closed paths;
typed decoder temporary memory, retained-copy rollback, long quantization IDs, FLOAT8 scale handling, and FLOAT8 saturation;
primary/sidecar identity races, direct external-data hashing, package role identity, cache/message/exit-2 fail-closed semantics;
byte-identical-only clustering with composable nested provenance.

Narrowing

Published diff reduced from +11,806/-4,208 across 9 files at 3f8b8796 to +3,682/-458 across 14 files at 27493341.
Superseded broad ONNX/hash test additions were deleted; focused quantized regressions now live in tests/scanners/test_onnx_quantized_analysis.py.
Current main is included additively via merge commit 625b85c9; no history rewrite or main write.
Exact-head CI surfaced June 15 upstream aiohttp/cryptography advisories; the separable final dependency commit only constrains already-transitive audited resolutions and refreshes uv.lock so CI can stay fail-closed.

Validation

ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/: passed.
ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/: passed.
mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/: passed (478 source files).
pytest tests/test_streaming_scan.py::test_scan_model_streaming_fails_closed_after_max_total_size -q: passed after preserving the streamed hash hook.
pytest tests/scanners/test_onnx_quantized_analysis.py -q: skipped locally because this disk-conscious checkout does not have optional onnx installed; exact-head CI must exercise ONNX extras.
Exact-head Codex followup at 27493341 adds focused regressions for duplicate sidecar hashes, clustered failed checks, and max-size-skipped direct sidecar accounting.
dependency audit recipe with public PyPI index: passed after audited transitive lock refresh.
Full non-slow/non-integration pytest was attempted in the lean local environment and stopped after the pre-fix streamed-hash-hook failure; the specific failure is fixed and covered above. I did not download missing optional ONNX/model dependencies or model corpora.

Merge gate

Do not merge until exact-head CI is green, an exact-head Codex review is clean, unresolved actionable threads remain zero, and a human approval is present.

mldangelo-oai · 2026-06-11T05:20:06Z

@codex review

github-actions · 2026-06-11T05:22:07Z

Workflow run and artifacts

Performance Benchmarks

Compared 13 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 13 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 4.165s -> 4.124s (-1.0%).

Workload	Benchmark	Target	Size	Files	Baseline	Current	Change	Status
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex]`	`nested_hex`	130 B	1	302.6us	312.1us	+3.1%	stable
`padded-multi-stream-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload`	`multi_stream_padded`	4.1 KiB	1	340.3us	349.9us	+2.8%	stable
`suspicious-pickle-intake`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake`	`suspicious-intake`	183.8 KiB	4	130.58ms	133.33ms	+2.1%	stable
`rejected-basic-auth-candidates`	`tests/benchmarks/test_scan_benchmarks.py::test_rejected_basic_auth_candidates_scan_linearly`	`-`	371.1 KiB	1	2.534s	2.487s	-1.9%	stable
`direct-malicious-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload`	`malicious_reduce`	52 B	1	222.6us	219.0us	-1.6%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw]`	`nested_raw`	78 B	1	271.6us	274.8us	+1.2%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64]`	`nested_base64`	98 B	1	276.8us	273.7us	-1.1%	stable
`mixed-model-repository`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository`	`release-candidate`	547.3 KiB	32	554.57ms	558.13ms	+0.6%	stable
`chunked-upload-stream`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream`	`chunked_stream`	278.2 KiB	1	112.89ms	113.30ms	+0.4%	stable
`single-checkpoint-preflight`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load`	`single_checkpoint.pkl`	183.0 KiB	1	86.45ms	86.32ms	-0.2%	stable
`warm-cache-rescan`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan`	`release-candidate`	547.3 KiB	32	146.82ms	146.71ms	-0.1%	stable
`clean-training-checkpoint`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint`	`safe_large`	278.2 KiB	1	110.35ms	110.40ms	+0.0%	stable
`duplicate-heavy-registry`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot`	`registry-snapshot`	915.2 KiB	13	487.50ms	487.44ms	-0.0%	stable

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e16ccb99ec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T05:35:26Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e16ccb99ec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T06:16:19Z

Independent review: promptfoo/modelaudit PR #1670

Reviewed at 2026-06-11T06:12:15Z using the complete local pr-review workflow in review_only + review_and_verify + decision_support mode.

Disposition

NO MERGE / REQUEST CHANGES at exact head e16ccb99ecee6f7237bf225bd502efbfe451f526.

The branch is current with origin/main and CI is green, but four P1 correctness/resource findings reproduce at this exact head. Two are already represented by unresolved current-head review threads; two additional P1s were found independently. There are also user-visible false-positive and aggregation correctness problems plus the unresolved changelog-policy violation.

Exact target and live state

PR: promptfoo/modelaudit#1670, fix(onnx): preserve quantized weight lineage coverage
State: open, non-draft
Exact head reviewed: e16ccb99ecee6f7237bf225bd502efbfe451f526
Exact base/current origin/main: 8d6c4864fe2ea833ceaef1b9803d225afb1e8d69
Merge base: 8d6c4864fe2ea833ceaef1b9803d225afb1e8d69
Ahead/behind: 1 ahead, 0 behind
GitHub mergeability: MERGEABLE
GitHub merge state: BLOCKED
Review decision: REVIEW_REQUIRED
Diff: 4 files, +967/-62
Disposable checkout: /private/tmp/modelaudit-pr1670-independent-review-20260611/checkout, detached and clean at the exact head
Base merge: not needed; the head commit is directly based on current origin/main
GitHub mutation: none

Findings

P1 — Integer operators trust an unrelated name-matched scale, enabling a clean false negative and making normal spec-valid models incomplete

Location: modelaudit/scanners/onnx_scanner.py:1340-1363, especially :1351-1355.

MatMulInteger and ConvInteger have zero-point inputs but no scale input. The new fallback looks for graph constants named <weight>_scale or <weight-without-_quantized>_scale and treats the first match as semantically bound to the operator. The graph does not establish that relationship.

Reproduction with checker-valid MatMulInteger models:

A normal model with no disconnected scale: eligible=1, analyzed=0, success=false, coverage gap unresolved_initializer_lineage=1.
The same model with extreme int8 values plus an otherwise unused W_scale=0: eligible=1, analyzed=1, success=true, no coverage check, and zero anomaly checks.

This is both a false positive/incomplete result for ordinary valid integer operators and a false-negative bypass when an attacker supplies a name-matched zero scale. The new test at tests/scanners/test_onnx_scanner.py:5164-5201 encodes the unsupported naming convention rather than proving graph lineage.

Validation: VALIDATED, confidence 100, exact head e16ccb99ecee6f7237bf225bd502efbfe451f526, introduced by this change. This matches unresolved thread PRRT_kwDOOEde4c6Is88U at onnx_scanner.py:1355.

Required fix: do not infer scales by name. For integer operators, analyze centered integer weights from the operator's actual zero-point input, or trace a scale through real downstream graph edges before using it. Add checker-valid tests for no scale, misleading unused scales, zero/NaN scales, and real ORT graph patterns.

P1 — Post-dequantization transforms use a stale quantization axis and silently analyze the wrong values

Locations: modelaudit/scanners/onnx_scanner.py:1553-1563, :1685-1698, and :2803-2831.

The lineage stores the DequantizeLinear axis once and propagates the same _OnnxWeightQuantization object through later Transpose, Reshape, Squeeze, and Unsqueeze transforms. Materialization then skips the DQ transform, applies later transforms to the raw integer tensor, and finally broadcasts scale/zero-point using the original axis.

That ordering is only correct if the quantization axis is remapped by every downstream shape transform. It is not. If the stale axis has a different dimension length, the scan becomes incomplete. If the old and new axes happen to have the same length, the scan silently analyzes incorrect values.

Checker-valid adversarial probe:

Raw shape (2, 2, 3), per-axis DQ on axis 0, then Transpose(1, 2, 0) to (2, 3, 2).
Correct DQ axis after transpose: 2. Recorded/used axis: 0.
The plan reported lineage ["DequantizeLinear", "Transpose"] and quantization_axis=0.
Example mismatches: analyzed value [0,0,1] = 3.0, correct value 60.0; analyzed [1,0,0] = 30.0, correct value 1.5.

This can create either false positives or false negatives in the weight-distribution detector.

Validation: VALIDATED, confidence 100, exact head e16ccb99ecee6f7237bf225bd502efbfe451f526, introduced by the new quantized transform materialization path.

Required fix: materialize DQ at its actual lineage position before downstream transforms, or rigorously remap the quantization axis/parameter shape through every supported transform. Add non-square and equal-dimension adversarial tests so shape equality cannot mask wrong-axis behavior.

P1 — Per-array bounds allow up to 100 retained float32 copies of one initializer (approximately 10 GiB at defaults)

Locations: modelaudit/scanners/onnx_scanner.py:88-98, :2801-2831, :2916-2924, and the direct caller at :3737-3743.

Each unique (transforms, quantization) group materializes and caches a full float32 array. Every _OnnxWeightAnalysisSpec retains its array. The direct OnnxScanner passes only a per-array max_array_size check and does not pass the available cumulative retain_array_check. The per-initializer analysis-group limit is 100 and the default per-array limit is 100 MiB, so a small graph can retain close to 10 GiB for one roughly 25 MiB int8 initializer.

Checker-valid resource probe:

One shared 256x256 int8 initializer.
100 valid QLinearMatMul consumers with distinct scalar scale initializers.
Per-array limit: 262,145 bytes.
Plan result: 100 specs, 100 unique float32 arrays, 26,214,400 retained bytes, no coverage gap.
Retained-to-configured-limit ratio: approximately 100x.

The existing group-fanout test at tests/scanners/test_onnx_scanner.py:5281-5295 checks the 101st group is rejected but does not check cumulative retained bytes for the first 100.

Validation: VALIDATED, confidence 100, exact head e16ccb99ecee6f7237bf225bd502efbfe451f526, introduced by quantized copy materialization.

Required fix: enforce a cumulative retained-byte budget in OnnxScanner, release each dequantized array after its spec is analyzed, or stream analysis per group instead of retaining all arrays in the plan. Add a test that keeps every individual copy under the per-array bound while exhausting the cumulative budget.

P1 — Valid INT4/UINT4 and INT2/UINT2 QDQ weights are skipped clean instead of analyzed or marked incomplete

Locations: modelaudit/scanners/onnx_scanner.py:1147-1157 and :1397-1401.

The new quantized integer allowlist omits INT4, UINT4, INT2, and UINT2, even though current ONNX DequantizeLinear permits all four. The lineage reaches MatMul, but add_consumer_group rejects the original initializer as non_floating_weight. It does not add an eligible initializer or a coverage gap.

Independent ONNX 1.21.0 probes for all four dtypes passed onnx.checker.check_model at opset 25. Each malicious QDQ model then returned:

eligible=0
analyzed=0
success=true
no coverage check
no anomaly check
exclusion non_floating_weight=1

Validation: VALIDATED, confidence 100, exact head e16ccb99ecee6f7237bf225bd502efbfe451f526, incomplete implementation introduced by the new DQ coverage path. This confirms unresolved thread PRRT_kwDOOEde4c6Is88S and expands its required scope from 4-bit to 2-bit types as well.

Required fix: support every DQ input type the installed ONNX schema and decoder can safely materialize, including packed INT4/UINT4 and INT2/UINT2, or fail closed with explicit eligible/analyzed coverage metadata. Add benign and malicious tests for each supported packed type.

P2 — Aggregate clustering merges non-identical, semantically distinct models and labels them equivalent

Locations: modelaudit/core.py:992-1019 and :1051-1119.

The cluster key does not include content hash, tensor digest, actual extreme positions, sign, or another proof of export equivalence. Content hashes are added only as provenance after clustering.

Reproduction:

Model A: positive extreme weights at rows 50-54, output 3, SHA-256 1f4452e41b294a62819ab2cc1dfe598f61b7553637c66f28545d1a3b7415c47b.
Model B: negative extreme weights at rows 10-14, output 3, SHA-256 59fd13c77542c183a08f8eb384d653ceff094ebf5d9ee52701858b4fb60eccb2.
Same initializer/op/shape and aggregate counts, but different bytes, signs, and locations.
Result: one issue, message suffix clustered across 2 equivalent ONNX exports, cluster_size=2, and no byte-identical group.

Provenance remains visible, but two distinct causes are collapsed and incorrectly described as equivalent. The added test at tests/scanners/test_onnx_scanner.py:5700-5724 distinguishes only a different operator and therefore does not exercise this over-clustering case.

Validation: VALIDATED, confidence 98, exact head e16ccb99ecee6f7237bf225bd502efbfe451f526, introduced by this change.

Required fix: cluster only byte-identical exports by content hash, or add a bounded tensor/value fingerprint that proves semantic equivalence. Keep distinct hashes separate unless equivalence is independently established.

P2 — The pinned clean MiniLM export produces 31 new user-visible weight anomaly issues, but the integration test checks coverage only

Location: tests/scanners/test_onnx_scanner.py:5726-5747; behavior originates from the newly emitted quantized specs at modelaudit/scanners/onnx_scanner.py:2916-2924.

Real pinned artifact:

Repo/revision: sentence-transformers/all-MiniLM-L12-v2@a50ef00143b4d5391434df20ae11632588ac25be
Path: onnx/model_qint8_avx512.onnx
Size: 34,118,638 bytes
SHA-256: d37479e69ddcb2f7af2ae225a41cb57b0b6fd547dd8740aef4f0c8e5b41d4fcf

Head result:

72 eligible / 72 analyzed / 72 layers
Coverage gap lineages_per_value_limit: 109
31 failed Weight Distribution Anomaly Detection checks/issues, all INFO
Aggregate result: 32 issues total, 31 weight issues, exit 2 because coverage is incomplete
Direct scan elapsed 24.27s; aggregate scan elapsed 28.25s
Direct-process peak RSS: approximately 466 MiB (488,669,184 raw bytes on macOS)

The PR body says sampled flagged tensors match the clean ONNX/safetensors reference and that quantized deltas are consistent with normal quantization, but the added integration test asserts only the 72/72 coverage counts and never asserts the expected anomaly/issue outcome. The feature therefore exposes a large user-visible false-positive/noise surface on its own pinned clean model.

Validation: VALIDATED as an operational false-positive/test gap, confidence 90, exact head e16ccb99ecee6f7237bf225bd502efbfe451f526, introduced by enabling quantized specs.

Required fix: define and assert the expected clean-model issue outcome. If these INFO issues are intentionally retained, document that contract and bound/report them distinctly from suspicious corruption; otherwise calibrate quantized analysis so the clean pinned export is not reported as anomalous.

P2 — Required `[Unreleased]` changelog entry is missing

Locations: AGENTS.md:80 and CHANGELOG.md:8-15.

The PR changes user-visible quantized ONNX analysis, metadata, incomplete outcomes, issue counts, and aggregate clustering. The repository instruction explicitly requires an [Unreleased] entry for user-visible changes. The existing ONNX entry at CHANGELOG.md:14 describes prior bounded semantic lineage work but does not mention quantized operator/DQ support or aggregate clustering.

Validation: VALIDATED, confidence 100, exact head e16ccb99ecee6f7237bf225bd502efbfe451f526. This matches unresolved thread PRRT_kwDOOEde4c6ItD1P.

Review-thread reconciliation

All four threads are unresolved, non-outdated, and attached to exact head e16ccb99ecee6f7237bf225bd502efbfe451f526.

Thread	Location	Classification	Review result
`PRRT_kwDOOEde4c6Is2tA`	`modelaudit/scanners/onnx_scanner.py:180`	Accurate, non-blocking cleanup	`_QUANTIZED_WEIGHT_TRANSFORM_OPERATORS` is unused. This is dead code but not a behavioral finding.
`PRRT_kwDOOEde4c6Is88S`	`modelaudit/scanners/onnx_scanner.py:1156`	Actionable P1	Confirmed for INT4/UINT4; independent probes also confirm INT2/UINT2.
`PRRT_kwDOOEde4c6Is88U`	`modelaudit/scanners/onnx_scanner.py:1355`	Actionable P1	Confirmed: an unrelated `W_scale=0` suppresses the anomaly and returns success.
`PRRT_kwDOOEde4c6ItD1P`	`modelaudit/scanners/onnx_scanner.py:315`	Actionable P2 policy issue	Confirmed missing `[Unreleased]` entry.

No thread is resolved or outdated. There are no approvals. Existing reviews are COMMENTED only.

CI state at exact head

All 29 check runs are terminal: 23 successful and 6 skipped. No check is pending, cancelled, or failed. GitHub reports only CI Success as required, and it passed.

Successful checks:

Analyze (python)
Analyze Python
Analyze GitHub Actions
Benchmarks (Python 3.11)
Build and Package
Build and Test Lightweight Docker Image
Build and Test TensorFlow Docker Image
Check Documentation Formatting
CodeQL
CI Success
Dependency Audit
Detect Changes
Detect Docker Changes
Docker CI Success
Lint and Format
Quick Feedback (Python 3.12)
Test Python 3.10
Test Python 3.13
Test Vendored TensorFlow Protos
Type Check
Validate PR Title
Verify Vendored Proto Reproducibility
Windows Tests (Python 3.11)

Skipped checks:

Build and Test Full Docker Image
Extras Smoke Test (matrix.extra)
License Compliance
Lock File Consistency
Standalone Pickle Package (matrix.python-version)
Test NumPy (matrix.numpy-mode / matrix.python-version)

CI is not sufficient evidence for merge because the reproduced cases are absent from the suite.

Validation evidence

Repository tests and static checks

Focused new quantized tests: 11 passed, 243 deselected.
Full ONNX scanner file excluding integration: 253 passed, 1 deselected, 1 warning.
Full ONNX scanner file without marker exclusion reached 251 passes, then the pinned HF test failed before download because the environment's configured SOCKS proxy lacked socksio. The exact artifact was subsequently fetched with the approved system transfer path and scanned directly.
Ruff check on all four changed files: passed.
Ruff format check on all four changed files: passed.
Mypy on all four changed files: passed.
git diff --check: passed.
Disposable checkout remained clean after validation.

Adversarial, malformed, and resource-bound probes

Integer-op disconnected scale spoof: reproduced clean false negative.
Valid integer op without invented scale: reproduced incomplete false positive.
INT4/UINT4/INT2/UINT2 QDQ: all checker-valid, all skipped clean.
Post-DQ transpose with equal-sized moved axes: reproduced silently wrong values.
100-group QLinearMatMul fan-out: reproduced 100x retained-memory amplification without a coverage gap.
Non-identical aggregate models: reproduced incorrect equivalence clustering.
Malformed QDQ scale shape: failed closed as expected.
External-data QDQ model: eligible=1, analyzed=0, coverage check present, success=false; failed closed as expected.
Edge-visit limit: failed closed as expected.
Initializer group 101 limit: failed closed as expected, but does not bound retained bytes for the first 100.
Self-overwrite malformed lineage: traversal remained bounded in the existing test.

Real pinned Hugging Face QA

The exact MiniLM revision/path named above was downloaded by immutable URL, content-hash verified, and scanned at the reviewed head. Coverage matched the PR claim (72/72/72 with lineages_per_value_limit: 109), while the user-visible anomaly issue count and memory footprint exposed the additional P2 risk documented above.

Full workflow lens summary

Gatekeeper: REVIEW; open, non-trivial runtime/security scanner change with explicit verification request.
Policy paths: root AGENTS.md applies to all four changed files; no narrower applicable policy file.
PR summary/history: one commit directly atop current main; no linked issue in the PR body; intent is to replace blanket quantized exclusions with bounded semantic analysis and cluster repeated export findings.
Diff-only and contextual review: every hunk in modelaudit/core.py, modelaudit/models.py, modelaudit/scanners/onnx_scanner.py, and tests/scanners/test_onnx_scanner.py inspected with surrounding call sites and tests.
Code-comment verification: no separate changed-comment contradiction survived validation.
Policy review x2: missing changelog entry validated; no other unambiguous policy violation.
Behavior-impact sidecar: valid integer models become incomplete; low-bit models remain cleanly skipped; pinned clean model gains 31 INFO anomaly issues.
Runtime sidecar: executed focused, malformed, external-data, alias/transform, resource, aggregate, and real pinned HF probes.
Branch/CI sidecar: current with main; all checks terminal; blocked by review, not branch freshness or CI.
Repo-surface sidecar: changelog drift validated; no cross-repo contract applies.

Merge recommendation

Do not merge e16ccb99ecee6f7237bf225bd502efbfe451f526.

At minimum, the next head must:

Remove name-based integer scale inference and use graph-proven lineage.
Correct quantization-axis semantics through downstream transforms.
Enforce a cumulative retained-memory bound for quantized copies.
Support or explicitly fail closed all valid low-bit DQ types, including 2-bit.
Prevent clustering across non-identical models without equivalence proof.
Define and test the expected clean pinned-model issue outcome.
Add the required changelog entry and resolve/reconcile all current threads.
Re-run focused adversarial QA, the real pinned artifact, and exact-head CI before reconsidering merge.

mldangelo-oai · 2026-06-11T06:35:45Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 10fb037ca9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T07:17:59Z

@codex review

chatgpt-codex-connector · 2026-06-11T07:22:48Z

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T11:00:46Z

Independent review of exact head 2c9a3c01f51caddc5dc0af3b4e6d8b48d0d238b6 found two merge blockers:

P1: integer-scale tracing trusts a dead side branch and can cleanly accept an unscaled malicious MatMulInteger output. Scale search accepts any downstream Mul reached from the integer-op output without requiring the scaled value to be the graph output or all consumers to be scaled. A checker-valid model with Y_int = MatMulInteger(...) as graph output correctly exits 2 as unanalyzed. Adding an unused Cast(Y_int) -> Mul(..., dead_scale_vector) branch while retaining graph output Y_int changes the result to exit 0, zero issues/checks, eligible=1/analyzed=1, and quantization_scale=dead_scale_vector.
P2: streaming clusters distinct ONNX files as equivalent because per-file hashes are omitted from the cluster key. _file_content_hash() reads metadata.content_hash, while streaming stores SHA-256 under metadata.file_hashes.sha256; the cluster key therefore sees None. Streaming two different ONNX files emitted only two issues at the first file, both with cluster_size=2, empty byte-identical groups, and provenance hashes [None, None]. Base emitted four unclustered issues.

Validation: the quantized/aggregate ONNX slice had 23 passes; the non-integration ONNX scanner suite had 265 passes and one deselection. Both reproductions were confirmed through CLI/model QA. Exact-head CI is terminal green/skipped and all review threads are resolved, but these behavioral bugs block merge.

…0613

mldangelo-oai · 2026-06-13T08:53:26Z

Fleet sequencing note: keep the goal active and continue the full review, simplification, conflict resolution, validation, title/body cleanup, and exact-head CI work. Please hold only the final merge until Nightly repair PR #1679 reports MERGED. Immediately afterward, fetch the new origin/main, merge it additively into this published branch (no rebase or force-push), re-run exact-new-head technical gates, and then use the normal squash/admin-review-only merge path. This coordination hold should not pause any other work.

mldangelo-oai · 2026-06-13T09:07:37Z

Fleet release: Nightly repair PR #1679 is MERGED as b83dff38026574e255756613d46f65f0010a11b6 (2026-06-13 09:07:12Z). Fetch origin/main now and merge that exact current main additively into this published branch; do not rebase or force-push. Reconcile any overlap according to current behavior, re-run the final review/simplification/thread/metadata gates, push, and require exact-new-head CI before squash merge. If main advances again before your merge, repeat the normal merge-from-main and exact-head gate. The prior hold is released; continue the goal through verified merge and post-merge main health.

Preserve authoritative quantization scale factors and execution precision, fail closed on ambiguous or malformed lineage, and keep anomaly clustering tied to byte-identical exports. Add regression coverage for operand orientation, graph bounds, parameter layouts, aggregation identity, and metadata consistency.

mldangelo-oai · 2026-06-13T09:14:46Z

Addressed the current-head review blockers in ede50b3 (on top of the additive main merge 91b8739):

integer-output scale inference now follows only the unique live authoritative path, rejects raw/scaled aliases and unresolved/dynamic/nested/cast scale expressions, preserves ordered/repeated scalar and vector factors, and retains the dtype at each scale application;
streaming and aggregate clustering now require non-empty SHA-256 identity and preserve distinct analysis/consumer/quantization groups;
added fail-closed shape/dtype checks, left-operand and batched layouts, bounded graph indexing, truthful fallback metadata, and ConvInteger/QLinearConv coverage.

Validation: ruff clean (422 files), full mypy clean (477 files), ONNX scanner 341 passed, weight-distribution scanner 130 passed/1 TensorFlow skip, and the pinned MiniLM qint8 regression passed at 72/72/72. Broad-suite stops reproduced unchanged on current main in unrelated static-getattr tests; exact-head CI is running and remains the merge gate.

mldangelo-oai · 2026-06-13T09:22:21Z

Final coordination gates for ede50b34:

test(pickle): keep protocol-0 storage tails fail closed #1673 is the next serial landing. Keep this PR's current CI/review work moving, but hold the final merge until test(pickle): keep protocol-0 storage tails fail closed #1673 reports MERGED; then merge the newest origin/main additively and rerun exact-head gates. The current body still describes failures on an older main and must be refreshed after that sync.
The current body records net +1007 production lines, a material exception to the user's LOC/complexity constraint. Run a deletion-first pass over the final synced diff: inventory the largest lineage/routing components, test whether existing bounded graph/quantization helpers can replace duplicate accessors or state, and remove historical review scaffolding. If it cannot shrink safely, add a concise per-component justification and fail-closed/budget invariant for the retained complexity. Do not code-golf safeguards; prove the size.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ede50b34b3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…0613

Keep QuantizeLinear-derived weights eligible for coverage failures and account for copies created by post-dequantization transforms.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 440933e0e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-13T10:32:40Z

Fleet sequencing update: post-#1679 main Python CI run 27462464942 is red on two Linux 3.11 false-positive tests. A dedicated goal-mode worker owns that exact current-main repair. Continue review, simplification, conflict resolution, targeted/full validation, title/body cleanup, and comment handling for this PR, but do not perform the final merge while main is red.

After the CI repair lands, #1673 remains the first test-only serial gate. Before this PR's eventual merge, fetch and additively merge the newest origin/main, resolve overlap without rebasing or force-pushing, and require exact-new-head review/CI. Keep the goal active across the coordination hold.

Trace integer scale paths across bounded bias adds and preserve fail-closed lineage for standard DynamicQuantizeLinear weight outputs.

…0613

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 15f3b04d70

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-13T11:03:23Z

Coordinator review gate: four fresh unresolved quantized-ONNX threads remain, including two P1 fail-open lineage cases. Fix or rebut every thread with regressions, reply/resolve, then rerun exact-head gates. Final merge remains held behind #1680.

mldangelo-oai · 2026-06-15T23:26:17Z

Corrected the exact-head shared-sidecar expectation at 1b2d1299: the two fixtures are byte-identical ONNX packages, so package content identity is recorded once while sidecar bytes are still counted once. @codex review exact head 1b2d1299; stop and report stale if the head moves.

chatgpt-codex-connector · 2026-06-15T23:32:14Z

Codex Review: Didn't find any major issues. Another round soon, please!

Reviewed commit: 1b2d129910

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-16T00:13:55Z

Exact-head CI at 1b2d1299 had one unrelated Windows-only failure in tests/scanners/test_joblib_scanner_codecs.py::test_scan_accepts_security_like_bytes_inside_numpy_array[pickle.loads] after 15,395 passes; rerunning failed exact-head jobs without widening the ONNX/identity/cache diff.

mldangelo-oai · 2026-06-16T00:19:45Z

Addressed the three exact-head Codex findings additively at f7bdc186: retain a bare digest only when a non-package source shares consumed ONNX bytes; collapse/update matching anomaly checks with clustered issues; move direct external-data byte accounting after max-size/total-size deferral. Added focused compact regressions. @codex review exact head f7bdc186; stop and report stale if the head moves.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f7bdc1866c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-16T00:52:01Z

Exact-head ONNX CI validated the clustered check count but showed anomaly stats intentionally exclude INFO findings; removed only that over-specific new assertion at 65e71f18. @codex review exact head 65e71f18; stop and report stale if the head moves.

mldangelo-oai · 2026-06-16T00:54:17Z

Addressed the next exact-head Codex findings additively at 89684f97: clustered raw and pre-consolidated anomaly check contexts consistently with clustered issues, and moved directory external-data byte accounting after max-total-size deferral. Added compact consolidated-check and directory-budget regressions. @codex review exact head 89684f97; stop and report stale if the head moves.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 89684f97b3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-16T01:36:11Z

Exact-head ONNX CI showed the new directory budget fixture used a separately scheduled .bin sidecar; corrected only the fixture at bbc76c2a to use a declared-but-skipped external-data sidecar, matching the guarded directory path. @codex review exact head bbc76c2a; stop and report stale if the head moves.

mldangelo-oai · 2026-06-16T01:37:21Z

Addressed the exact-head consolidated-finding Codex followup additively at c9678061: nested consolidated findings re-key by bounded ONNX anomaly context/content identity rather than their generic consolidated check message. @codex review exact head c9678061; stop and report stale if the head moves.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c967806101

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-16T02:19:01Z

Exact-head ONNX CI showed my new max-total-size fixture scheduled its sidecar independently rather than reaching the guarded skipped-sidecar branch; dropped only that invalid new fixture at 22b782f1 while retaining the Codex-requested directory accounting fix. @codex review exact head 22b782f1; stop and report stale if the head moves.

mldangelo-oai · 2026-06-16T02:22:42Z

Addressed the next exact-head Codex findings additively at 6992f49a: include neuron_index in bounded ONNX anomaly context; preserve a single DVC directory output's direct aggregate identity without re-aggregating; discard streamed ONNX sidecar digests only when that exact sidecar instance/target had been recorded. Added focused model, DVC, and streaming regressions. @codex review exact head 6992f49a; stop and report stale if the head moves.

chatgpt-codex-connector · 2026-06-16T02:30:07Z

Codex Review: Didn't find any major issues. 🎉

Reviewed commit: 6992f49a93

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-16T03:04:56Z

Exact-head ONNX CI exposed the existing refetched-sidecar expectation and the Codex-requested skipped directory sidecar accounting; fixed additively at 27493341 by binding streamed sidecar discard to recorded path/instance/target and updating the superseded skipped-sidecar byte expectation. @codex review exact head 27493341; stop and report stale if the head moves.

chatgpt-codex-connector · 2026-06-16T03:11:06Z

Codex Review: Didn't find any major issues. Hooray!

Reviewed commit: 27493341a1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-16T03:55:32Z

Ready for human approval: exact head 27493341 has green required CI, exact-head Codex review clean, unresolved actionable threads zero, title/body/changelog updated, and no admin bypass will be used. Please approve; I will merge once a human approval is present.

mldangelo-oai · 2026-06-16T21:39:04Z

@codex review exact head b61ceca

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b61ceca490

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-16T21:54:39Z

+                    if is_dvc_pointer and len(target_files) == 1:
+                        single_dvc_directory_content_hash = nested_result.content_hash
+                    elif nested_result.content_hash and nested_result.content_hash not in file_hashes:
+                        file_hashes.append(nested_result.content_hash)


Preserve flat DVC directory hashes

When a .dvc file resolves to multiple directory outputs, this branch appends each nested directory's already-aggregated content_hash and the parent later aggregates those aggregate hashes. compute_aggregate_hash is not associative, so the same two output directories produce a different top-level hash when scanned via one DVC pointer than when scanned directly or flattened as before, which breaks content-based deduplication for multi-output DVC models. The single-directory shortcut above preserves direct-scan parity only for len(target_files) == 1; the multi-directory path should still flatten the nested file/package hashes instead of hashing nested aggregates.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-16T21:54:39Z

+                        onnx_external_data_bytes_scanned += external_data_size
+                        top_level_hashed_bytes += external_data_size


Avoid double-counting self-referential ONNX sidecars

When an ONNX external_data location resolves back to the primary .onnx file itself, the primary file has already been added to top_level_hashed_bytes before this loop, but these lines add the same file's size again as if it were a separate sidecar. That makes direct scans of self-referential ONNX models report doubled scanned bytes and can incorrectly trip max_total_size for a single physical file whose real size is within the limit; skip the external-data accounting when its identity matches the primary model identity.

Useful? React with 👍 / 👎.

fix(onnx): preserve quantized weight lineage coverage

e16ccb9

github-code-quality Bot found potential problems Jun 11, 2026

View reviewed changes

Comment thread modelaudit/scanners/onnx_scanner.py Fixed

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/scanners/onnx_scanner.py

Comment thread modelaudit/scanners/onnx_scanner.py Outdated

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/scanners/onnx_scanner.py

fix: harden ONNX quantized lineage review fixes

10fb037

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/scanners/onnx_scanner.py Outdated

Comment thread modelaudit/core.py Outdated

fix: close ONNX quantized lineage review gaps

2c9a3c0

Merge remote-tracking branch 'origin/main' into takeover/pr-1670-2026…

91b8739

…0613

chatgpt-codex-connector Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread modelaudit/scanners/onnx_scanner.py Outdated

Comment thread modelaudit/scanners/onnx_scanner.py Outdated

mldangelo-oai added 2 commits June 13, 2026 09:57

Merge remote-tracking branch 'origin/main' into takeover/pr-1670-2026…

2015268

…0613

fix(onnx): fail closed on unresolved quantization

440933e

Keep QuantizeLinear-derived weights eligible for coverage failures and account for copies created by post-dequantization transforms.

chatgpt-codex-connector Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread modelaudit/scanners/onnx_scanner.py Outdated

Comment thread modelaudit/scanners/onnx_scanner.py Outdated

mldangelo-oai added 2 commits June 13, 2026 10:35

fix(onnx): close quantized lineage gaps

57aa9ca

Trace integer scale paths across bounded bias adds and preserve fail-closed lineage for standard DynamicQuantizeLinear weight outputs.

Merge remote-tracking branch 'origin/main' into takeover/pr-1670-2026…

15f3b04

…0613

chatgpt-codex-connector Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread modelaudit/scanners/onnx_scanner.py Outdated

Comment thread modelaudit/scanners/onnx_scanner.py

Comment thread modelaudit/scanners/onnx_scanner.py

Comment thread modelaudit/scanners/onnx_scanner.py Outdated

test(onnx): dedupe shared package hash

1b2d129

fix(onnx): preserve package aggregate evidence

f7bdc18

chatgpt-codex-connector Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread modelaudit/core.py

Comment thread modelaudit/core.py Outdated

test(onnx): avoid info-stat expectation

65e71f1

fix(onnx): cluster consolidated anomaly checks

89684f9

chatgpt-codex-connector Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread modelaudit/core.py Outdated

test(onnx): use skipped external sidecar fixture

bbc76c2

fix(onnx): rekey consolidated anomaly findings

c967806

chatgpt-codex-connector Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread modelaudit/models.py Outdated

Comment thread modelaudit/core.py Outdated

Comment thread modelaudit/core.py Outdated

test(onnx): drop scheduled-sidecar budget case

22b782f

fix(onnx): preserve remaining aggregate identities

6992f49

fix(onnx): bind streamed sidecar discard

2749334

mldangelo-oai mentioned this pull request Jun 16, 2026

fix(safetensors): validate zero-based shards and bind retained inputs #1668

Open

fix(onnx): bind cached scans to primary identity

b61ceca

chatgpt-codex-connector Bot reviewed Jun 16, 2026

View reviewed changes

		onnx_external_data_bytes_scanned += external_data_size
		top_level_hashed_bytes += external_data_size

Conversation

mldangelo-oai commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Preserved security cases

Narrowing

Validation

Merge gate

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmarks

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Independent review: promptfoo/modelaudit PR #1670

Disposition

Exact target and live state

Findings

P1 — Integer operators trust an unrelated name-matched scale, enabling a clean false negative and making normal spec-valid models incomplete

P1 — Post-dequantization transforms use a stale quantization axis and silently analyze the wrong values

P1 — Per-array bounds allow up to 100 retained float32 copies of one initializer (approximately 10 GiB at defaults)

P1 — Valid INT4/UINT4 and INT2/UINT2 QDQ weights are skipped clean instead of analyzed or marked incomplete

P2 — Aggregate clustering merges non-identical, semantically distinct models and labels them equivalent

P2 — The pinned clean MiniLM export produces 31 new user-visible weight anomaly issues, but the integration test checks coverage only

P2 — Required [Unreleased] changelog entry is missing

Review-thread reconciliation

CI state at exact head

Validation evidence

Repository tests and static checks

Adversarial, malformed, and resource-bound probes

Real pinned Hugging Face QA

Full workflow lens summary

Merge recommendation

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 13, 2026

Uh oh!

mldangelo-oai commented Jun 13, 2026

Uh oh!

mldangelo-oai commented Jun 13, 2026

Uh oh!

mldangelo-oai commented Jun 13, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

mldangelo-oai commented Jun 11, 2026 •

edited

Loading

github-actions Bot commented Jun 11, 2026 •

edited

Loading

P2 — Required `[Unreleased]` changelog entry is missing