Skip to content

fix(onnx): narrow bounded quantized-weight analysis#1670

Open
mldangelo-oai wants to merge 66 commits into
mainfrom
mdangelo/codex/hf-fp-t41-onnx-weight-lineage-coverage-20260610
Open

fix(onnx): narrow bounded quantized-weight analysis#1670
mldangelo-oai wants to merge 66 commits into
mainfrom
mdangelo/codex/hf-fp-t41-onnx-weight-lineage-coverage-20260610

Conversation

@mldangelo-oai

@mldangelo-oai mldangelo-oai commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Replace the oversized quantized ONNX test expansion with a compact parameterized adversarial suite.
  • Analyze bounded QDQ/QLinear/integer weight lineage with shared retained-array accounting, bounded graph/scale traversal, typed decoder working-set reservation, and fail-closed incomplete proofs.
  • Preserve byte-identical ONNX anomaly/check clustering and role-bound ONNX external-data package identity, including streamed/directory identity races and unrelated duplicate sidecar bytes.

Preserved security cases

  • terminal descendants, graph-output descendants, canonical/custom GELU and local-op boundaries;
  • generated-value/lineage laundering and factor-amplification fail-closed paths;
  • typed decoder temporary memory, retained-copy rollback, long quantization IDs, FLOAT8 scale handling, and FLOAT8 saturation;
  • primary/sidecar identity races, direct external-data hashing, package role identity, cache/message/exit-2 fail-closed semantics;
  • byte-identical-only clustering with composable nested provenance.

Narrowing

  • Published diff reduced from +11,806/-4,208 across 9 files at 3f8b8796 to +3,682/-458 across 14 files at 27493341.
  • Superseded broad ONNX/hash test additions were deleted; focused quantized regressions now live in tests/scanners/test_onnx_quantized_analysis.py.
  • Current main is included additively via merge commit 625b85c9; no history rewrite or main write.
  • Exact-head CI surfaced June 15 upstream aiohttp/cryptography advisories; the separable final dependency commit only constrains already-transitive audited resolutions and refreshes uv.lock so CI can stay fail-closed.

Validation

  • ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/: passed.
  • ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/: passed.
  • mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/: passed (478 source files).
  • pytest tests/test_streaming_scan.py::test_scan_model_streaming_fails_closed_after_max_total_size -q: passed after preserving the streamed hash hook.
  • pytest tests/scanners/test_onnx_quantized_analysis.py -q: skipped locally because this disk-conscious checkout does not have optional onnx installed; exact-head CI must exercise ONNX extras.
  • Exact-head Codex followup at 27493341 adds focused regressions for duplicate sidecar hashes, clustered failed checks, and max-size-skipped direct sidecar accounting.
  • dependency audit recipe with public PyPI index: passed after audited transitive lock refresh.
  • Full non-slow/non-integration pytest was attempted in the lean local environment and stopped after the pre-fix streamed-hash-hook failure; the specific failure is fixed and covered above. I did not download missing optional ONNX/model dependencies or model corpora.

Merge gate

Do not merge until exact-head CI is green, an exact-head Codex review is clean, unresolved actionable threads remain zero, and a human approval is present.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Workflow run and artifacts

Performance Benchmarks

Compared 13 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 13 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 4.165s -> 4.124s (-1.0%).

Workload Benchmark Target Size Files Baseline Current Change Status
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex] nested_hex 130 B 1 302.6us 312.1us +3.1% stable
padded-multi-stream-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload multi_stream_padded 4.1 KiB 1 340.3us 349.9us +2.8% stable
suspicious-pickle-intake tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake suspicious-intake 183.8 KiB 4 130.58ms 133.33ms +2.1% stable
rejected-basic-auth-candidates tests/benchmarks/test_scan_benchmarks.py::test_rejected_basic_auth_candidates_scan_linearly - 371.1 KiB 1 2.534s 2.487s -1.9% stable
direct-malicious-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload malicious_reduce 52 B 1 222.6us 219.0us -1.6% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw] nested_raw 78 B 1 271.6us 274.8us +1.2% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64] nested_base64 98 B 1 276.8us 273.7us -1.1% stable
mixed-model-repository tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository release-candidate 547.3 KiB 32 554.57ms 558.13ms +0.6% stable
chunked-upload-stream tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream chunked_stream 278.2 KiB 1 112.89ms 113.30ms +0.4% stable
single-checkpoint-preflight tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load single_checkpoint.pkl 183.0 KiB 1 86.45ms 86.32ms -0.2% stable
warm-cache-rescan tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan release-candidate 547.3 KiB 32 146.82ms 146.71ms -0.1% stable
clean-training-checkpoint tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint safe_large 278.2 KiB 1 110.35ms 110.40ms +0.0% stable
duplicate-heavy-registry tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot registry-snapshot 915.2 KiB 13 487.50ms 487.44ms -0.0% stable

Comment thread modelaudit/scanners/onnx_scanner.py Fixed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e16ccb99ec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/onnx_scanner.py
Comment thread modelaudit/scanners/onnx_scanner.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e16ccb99ec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/onnx_scanner.py
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Independent review: promptfoo/modelaudit PR #1670

Reviewed at 2026-06-11T06:12:15Z using the complete local pr-review workflow in review_only + review_and_verify + decision_support mode.

Disposition

NO MERGE / REQUEST CHANGES at exact head e16ccb99ecee6f7237bf225bd502efbfe451f526.

The branch is current with origin/main and CI is green, but four P1 correctness/resource findings reproduce at this exact head. Two are already represented by unresolved current-head review threads; two additional P1s were found independently. There are also user-visible false-positive and aggregation correctness problems plus the unresolved changelog-policy violation.

Exact target and live state

  • PR: promptfoo/modelaudit#1670, fix(onnx): preserve quantized weight lineage coverage
  • State: open, non-draft
  • Exact head reviewed: e16ccb99ecee6f7237bf225bd502efbfe451f526
  • Exact base/current origin/main: 8d6c4864fe2ea833ceaef1b9803d225afb1e8d69
  • Merge base: 8d6c4864fe2ea833ceaef1b9803d225afb1e8d69
  • Ahead/behind: 1 ahead, 0 behind
  • GitHub mergeability: MERGEABLE
  • GitHub merge state: BLOCKED
  • Review decision: REVIEW_REQUIRED
  • Diff: 4 files, +967/-62
  • Disposable checkout: /private/tmp/modelaudit-pr1670-independent-review-20260611/checkout, detached and clean at the exact head
  • Base merge: not needed; the head commit is directly based on current origin/main
  • GitHub mutation: none

Findings

P1 — Integer operators trust an unrelated name-matched scale, enabling a clean false negative and making normal spec-valid models incomplete

Location: modelaudit/scanners/onnx_scanner.py:1340-1363, especially :1351-1355.

MatMulInteger and ConvInteger have zero-point inputs but no scale input. The new fallback looks for graph constants named <weight>_scale or <weight-without-_quantized>_scale and treats the first match as semantically bound to the operator. The graph does not establish that relationship.

Reproduction with checker-valid MatMulInteger models:

  • A normal model with no disconnected scale: eligible=1, analyzed=0, success=false, coverage gap unresolved_initializer_lineage=1.
  • The same model with extreme int8 values plus an otherwise unused W_scale=0: eligible=1, analyzed=1, success=true, no coverage check, and zero anomaly checks.

This is both a false positive/incomplete result for ordinary valid integer operators and a false-negative bypass when an attacker supplies a name-matched zero scale. The new test at tests/scanners/test_onnx_scanner.py:5164-5201 encodes the unsupported naming convention rather than proving graph lineage.

Validation: VALIDATED, confidence 100, exact head e16ccb99ecee6f7237bf225bd502efbfe451f526, introduced by this change. This matches unresolved thread PRRT_kwDOOEde4c6Is88U at onnx_scanner.py:1355.

Required fix: do not infer scales by name. For integer operators, analyze centered integer weights from the operator's actual zero-point input, or trace a scale through real downstream graph edges before using it. Add checker-valid tests for no scale, misleading unused scales, zero/NaN scales, and real ORT graph patterns.

P1 — Post-dequantization transforms use a stale quantization axis and silently analyze the wrong values

Locations: modelaudit/scanners/onnx_scanner.py:1553-1563, :1685-1698, and :2803-2831.

The lineage stores the DequantizeLinear axis once and propagates the same _OnnxWeightQuantization object through later Transpose, Reshape, Squeeze, and Unsqueeze transforms. Materialization then skips the DQ transform, applies later transforms to the raw integer tensor, and finally broadcasts scale/zero-point using the original axis.

That ordering is only correct if the quantization axis is remapped by every downstream shape transform. It is not. If the stale axis has a different dimension length, the scan becomes incomplete. If the old and new axes happen to have the same length, the scan silently analyzes incorrect values.

Checker-valid adversarial probe:

  • Raw shape (2, 2, 3), per-axis DQ on axis 0, then Transpose(1, 2, 0) to (2, 3, 2).
  • Correct DQ axis after transpose: 2. Recorded/used axis: 0.
  • The plan reported lineage ["DequantizeLinear", "Transpose"] and quantization_axis=0.
  • Example mismatches: analyzed value [0,0,1] = 3.0, correct value 60.0; analyzed [1,0,0] = 30.0, correct value 1.5.

This can create either false positives or false negatives in the weight-distribution detector.

Validation: VALIDATED, confidence 100, exact head e16ccb99ecee6f7237bf225bd502efbfe451f526, introduced by the new quantized transform materialization path.

Required fix: materialize DQ at its actual lineage position before downstream transforms, or rigorously remap the quantization axis/parameter shape through every supported transform. Add non-square and equal-dimension adversarial tests so shape equality cannot mask wrong-axis behavior.

P1 — Per-array bounds allow up to 100 retained float32 copies of one initializer (approximately 10 GiB at defaults)

Locations: modelaudit/scanners/onnx_scanner.py:88-98, :2801-2831, :2916-2924, and the direct caller at :3737-3743.

Each unique (transforms, quantization) group materializes and caches a full float32 array. Every _OnnxWeightAnalysisSpec retains its array. The direct OnnxScanner passes only a per-array max_array_size check and does not pass the available cumulative retain_array_check. The per-initializer analysis-group limit is 100 and the default per-array limit is 100 MiB, so a small graph can retain close to 10 GiB for one roughly 25 MiB int8 initializer.

Checker-valid resource probe:

  • One shared 256x256 int8 initializer.
  • 100 valid QLinearMatMul consumers with distinct scalar scale initializers.
  • Per-array limit: 262,145 bytes.
  • Plan result: 100 specs, 100 unique float32 arrays, 26,214,400 retained bytes, no coverage gap.
  • Retained-to-configured-limit ratio: approximately 100x.

The existing group-fanout test at tests/scanners/test_onnx_scanner.py:5281-5295 checks the 101st group is rejected but does not check cumulative retained bytes for the first 100.

Validation: VALIDATED, confidence 100, exact head e16ccb99ecee6f7237bf225bd502efbfe451f526, introduced by quantized copy materialization.

Required fix: enforce a cumulative retained-byte budget in OnnxScanner, release each dequantized array after its spec is analyzed, or stream analysis per group instead of retaining all arrays in the plan. Add a test that keeps every individual copy under the per-array bound while exhausting the cumulative budget.

P1 — Valid INT4/UINT4 and INT2/UINT2 QDQ weights are skipped clean instead of analyzed or marked incomplete

Locations: modelaudit/scanners/onnx_scanner.py:1147-1157 and :1397-1401.

The new quantized integer allowlist omits INT4, UINT4, INT2, and UINT2, even though current ONNX DequantizeLinear permits all four. The lineage reaches MatMul, but add_consumer_group rejects the original initializer as non_floating_weight. It does not add an eligible initializer or a coverage gap.

Independent ONNX 1.21.0 probes for all four dtypes passed onnx.checker.check_model at opset 25. Each malicious QDQ model then returned:

  • eligible=0
  • analyzed=0
  • success=true
  • no coverage check
  • no anomaly check
  • exclusion non_floating_weight=1

Validation: VALIDATED, confidence 100, exact head e16ccb99ecee6f7237bf225bd502efbfe451f526, incomplete implementation introduced by the new DQ coverage path. This confirms unresolved thread PRRT_kwDOOEde4c6Is88S and expands its required scope from 4-bit to 2-bit types as well.

Required fix: support every DQ input type the installed ONNX schema and decoder can safely materialize, including packed INT4/UINT4 and INT2/UINT2, or fail closed with explicit eligible/analyzed coverage metadata. Add benign and malicious tests for each supported packed type.

P2 — Aggregate clustering merges non-identical, semantically distinct models and labels them equivalent

Locations: modelaudit/core.py:992-1019 and :1051-1119.

The cluster key does not include content hash, tensor digest, actual extreme positions, sign, or another proof of export equivalence. Content hashes are added only as provenance after clustering.

Reproduction:

  • Model A: positive extreme weights at rows 50-54, output 3, SHA-256 1f4452e41b294a62819ab2cc1dfe598f61b7553637c66f28545d1a3b7415c47b.
  • Model B: negative extreme weights at rows 10-14, output 3, SHA-256 59fd13c77542c183a08f8eb384d653ceff094ebf5d9ee52701858b4fb60eccb2.
  • Same initializer/op/shape and aggregate counts, but different bytes, signs, and locations.
  • Result: one issue, message suffix clustered across 2 equivalent ONNX exports, cluster_size=2, and no byte-identical group.

Provenance remains visible, but two distinct causes are collapsed and incorrectly described as equivalent. The added test at tests/scanners/test_onnx_scanner.py:5700-5724 distinguishes only a different operator and therefore does not exercise this over-clustering case.

Validation: VALIDATED, confidence 98, exact head e16ccb99ecee6f7237bf225bd502efbfe451f526, introduced by this change.

Required fix: cluster only byte-identical exports by content hash, or add a bounded tensor/value fingerprint that proves semantic equivalence. Keep distinct hashes separate unless equivalence is independently established.

P2 — The pinned clean MiniLM export produces 31 new user-visible weight anomaly issues, but the integration test checks coverage only

Location: tests/scanners/test_onnx_scanner.py:5726-5747; behavior originates from the newly emitted quantized specs at modelaudit/scanners/onnx_scanner.py:2916-2924.

Real pinned artifact:

  • Repo/revision: sentence-transformers/all-MiniLM-L12-v2@a50ef00143b4d5391434df20ae11632588ac25be
  • Path: onnx/model_qint8_avx512.onnx
  • Size: 34,118,638 bytes
  • SHA-256: d37479e69ddcb2f7af2ae225a41cb57b0b6fd547dd8740aef4f0c8e5b41d4fcf

Head result:

  • 72 eligible / 72 analyzed / 72 layers
  • Coverage gap lineages_per_value_limit: 109
  • 31 failed Weight Distribution Anomaly Detection checks/issues, all INFO
  • Aggregate result: 32 issues total, 31 weight issues, exit 2 because coverage is incomplete
  • Direct scan elapsed 24.27s; aggregate scan elapsed 28.25s
  • Direct-process peak RSS: approximately 466 MiB (488,669,184 raw bytes on macOS)

The PR body says sampled flagged tensors match the clean ONNX/safetensors reference and that quantized deltas are consistent with normal quantization, but the added integration test asserts only the 72/72 coverage counts and never asserts the expected anomaly/issue outcome. The feature therefore exposes a large user-visible false-positive/noise surface on its own pinned clean model.

Validation: VALIDATED as an operational false-positive/test gap, confidence 90, exact head e16ccb99ecee6f7237bf225bd502efbfe451f526, introduced by enabling quantized specs.

Required fix: define and assert the expected clean-model issue outcome. If these INFO issues are intentionally retained, document that contract and bound/report them distinctly from suspicious corruption; otherwise calibrate quantized analysis so the clean pinned export is not reported as anomalous.

P2 — Required [Unreleased] changelog entry is missing

Locations: AGENTS.md:80 and CHANGELOG.md:8-15.

The PR changes user-visible quantized ONNX analysis, metadata, incomplete outcomes, issue counts, and aggregate clustering. The repository instruction explicitly requires an [Unreleased] entry for user-visible changes. The existing ONNX entry at CHANGELOG.md:14 describes prior bounded semantic lineage work but does not mention quantized operator/DQ support or aggregate clustering.

Validation: VALIDATED, confidence 100, exact head e16ccb99ecee6f7237bf225bd502efbfe451f526. This matches unresolved thread PRRT_kwDOOEde4c6ItD1P.

Review-thread reconciliation

All four threads are unresolved, non-outdated, and attached to exact head e16ccb99ecee6f7237bf225bd502efbfe451f526.

Thread Location Classification Review result
PRRT_kwDOOEde4c6Is2tA modelaudit/scanners/onnx_scanner.py:180 Accurate, non-blocking cleanup _QUANTIZED_WEIGHT_TRANSFORM_OPERATORS is unused. This is dead code but not a behavioral finding.
PRRT_kwDOOEde4c6Is88S modelaudit/scanners/onnx_scanner.py:1156 Actionable P1 Confirmed for INT4/UINT4; independent probes also confirm INT2/UINT2.
PRRT_kwDOOEde4c6Is88U modelaudit/scanners/onnx_scanner.py:1355 Actionable P1 Confirmed: an unrelated W_scale=0 suppresses the anomaly and returns success.
PRRT_kwDOOEde4c6ItD1P modelaudit/scanners/onnx_scanner.py:315 Actionable P2 policy issue Confirmed missing [Unreleased] entry.

No thread is resolved or outdated. There are no approvals. Existing reviews are COMMENTED only.

CI state at exact head

All 29 check runs are terminal: 23 successful and 6 skipped. No check is pending, cancelled, or failed. GitHub reports only CI Success as required, and it passed.

Successful checks:

  • Analyze (python)
  • Analyze Python
  • Analyze GitHub Actions
  • Benchmarks (Python 3.11)
  • Build and Package
  • Build and Test Lightweight Docker Image
  • Build and Test TensorFlow Docker Image
  • Check Documentation Formatting
  • CodeQL
  • CI Success
  • Dependency Audit
  • Detect Changes
  • Detect Docker Changes
  • Docker CI Success
  • Lint and Format
  • Quick Feedback (Python 3.12)
  • Test Python 3.10
  • Test Python 3.13
  • Test Vendored TensorFlow Protos
  • Type Check
  • Validate PR Title
  • Verify Vendored Proto Reproducibility
  • Windows Tests (Python 3.11)

Skipped checks:

  • Build and Test Full Docker Image
  • Extras Smoke Test (matrix.extra)
  • License Compliance
  • Lock File Consistency
  • Standalone Pickle Package (matrix.python-version)
  • Test NumPy (matrix.numpy-mode / matrix.python-version)

CI is not sufficient evidence for merge because the reproduced cases are absent from the suite.

Validation evidence

Repository tests and static checks

  • Focused new quantized tests: 11 passed, 243 deselected.
  • Full ONNX scanner file excluding integration: 253 passed, 1 deselected, 1 warning.
  • Full ONNX scanner file without marker exclusion reached 251 passes, then the pinned HF test failed before download because the environment's configured SOCKS proxy lacked socksio. The exact artifact was subsequently fetched with the approved system transfer path and scanned directly.
  • Ruff check on all four changed files: passed.
  • Ruff format check on all four changed files: passed.
  • Mypy on all four changed files: passed.
  • git diff --check: passed.
  • Disposable checkout remained clean after validation.

Adversarial, malformed, and resource-bound probes

  • Integer-op disconnected scale spoof: reproduced clean false negative.
  • Valid integer op without invented scale: reproduced incomplete false positive.
  • INT4/UINT4/INT2/UINT2 QDQ: all checker-valid, all skipped clean.
  • Post-DQ transpose with equal-sized moved axes: reproduced silently wrong values.
  • 100-group QLinearMatMul fan-out: reproduced 100x retained-memory amplification without a coverage gap.
  • Non-identical aggregate models: reproduced incorrect equivalence clustering.
  • Malformed QDQ scale shape: failed closed as expected.
  • External-data QDQ model: eligible=1, analyzed=0, coverage check present, success=false; failed closed as expected.
  • Edge-visit limit: failed closed as expected.
  • Initializer group 101 limit: failed closed as expected, but does not bound retained bytes for the first 100.
  • Self-overwrite malformed lineage: traversal remained bounded in the existing test.

Real pinned Hugging Face QA

The exact MiniLM revision/path named above was downloaded by immutable URL, content-hash verified, and scanned at the reviewed head. Coverage matched the PR claim (72/72/72 with lineages_per_value_limit: 109), while the user-visible anomaly issue count and memory footprint exposed the additional P2 risk documented above.

Full workflow lens summary

  • Gatekeeper: REVIEW; open, non-trivial runtime/security scanner change with explicit verification request.
  • Policy paths: root AGENTS.md applies to all four changed files; no narrower applicable policy file.
  • PR summary/history: one commit directly atop current main; no linked issue in the PR body; intent is to replace blanket quantized exclusions with bounded semantic analysis and cluster repeated export findings.
  • Diff-only and contextual review: every hunk in modelaudit/core.py, modelaudit/models.py, modelaudit/scanners/onnx_scanner.py, and tests/scanners/test_onnx_scanner.py inspected with surrounding call sites and tests.
  • Code-comment verification: no separate changed-comment contradiction survived validation.
  • Policy review x2: missing changelog entry validated; no other unambiguous policy violation.
  • Behavior-impact sidecar: valid integer models become incomplete; low-bit models remain cleanly skipped; pinned clean model gains 31 INFO anomaly issues.
  • Runtime sidecar: executed focused, malformed, external-data, alias/transform, resource, aggregate, and real pinned HF probes.
  • Branch/CI sidecar: current with main; all checks terminal; blocked by review, not branch freshness or CI.
  • Repo-surface sidecar: changelog drift validated; no cross-repo contract applies.

Merge recommendation

Do not merge e16ccb99ecee6f7237bf225bd502efbfe451f526.

At minimum, the next head must:

  1. Remove name-based integer scale inference and use graph-proven lineage.
  2. Correct quantization-axis semantics through downstream transforms.
  3. Enforce a cumulative retained-memory bound for quantized copies.
  4. Support or explicitly fail closed all valid low-bit DQ types, including 2-bit.
  5. Prevent clustering across non-identical models without equivalence proof.
  6. Define and test the expected clean pinned-model issue outcome.
  7. Add the required changelog entry and resolve/reconcile all current threads.
  8. Re-run focused adversarial QA, the real pinned artifact, and exact-head CI before reconsidering merge.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 10fb037ca9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/onnx_scanner.py Outdated
Comment thread modelaudit/core.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Independent review of exact head 2c9a3c01f51caddc5dc0af3b4e6d8b48d0d238b6 found two merge blockers:

  1. P1: integer-scale tracing trusts a dead side branch and can cleanly accept an unscaled malicious MatMulInteger output. Scale search accepts any downstream Mul reached from the integer-op output without requiring the scaled value to be the graph output or all consumers to be scaled. A checker-valid model with Y_int = MatMulInteger(...) as graph output correctly exits 2 as unanalyzed. Adding an unused Cast(Y_int) -> Mul(..., dead_scale_vector) branch while retaining graph output Y_int changes the result to exit 0, zero issues/checks, eligible=1/analyzed=1, and quantization_scale=dead_scale_vector.
  2. P2: streaming clusters distinct ONNX files as equivalent because per-file hashes are omitted from the cluster key. _file_content_hash() reads metadata.content_hash, while streaming stores SHA-256 under metadata.file_hashes.sha256; the cluster key therefore sees None. Streaming two different ONNX files emitted only two issues at the first file, both with cluster_size=2, empty byte-identical groups, and provenance hashes [None, None]. Base emitted four unclustered issues.

Validation: the quantized/aggregate ONNX slice had 23 passes; the non-integration ONNX scanner suite had 265 passes and one deselection. Both reproductions were confirmed through CLI/model QA. Exact-head CI is terminal green/skipped and all review threads are resolved, but these behavioral bugs block merge.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Fleet sequencing note: keep the goal active and continue the full review, simplification, conflict resolution, validation, title/body cleanup, and exact-head CI work. Please hold only the final merge until Nightly repair PR #1679 reports MERGED. Immediately afterward, fetch the new origin/main, merge it additively into this published branch (no rebase or force-push), re-run exact-new-head technical gates, and then use the normal squash/admin-review-only merge path. This coordination hold should not pause any other work.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Fleet release: Nightly repair PR #1679 is MERGED as b83dff38026574e255756613d46f65f0010a11b6 (2026-06-13 09:07:12Z). Fetch origin/main now and merge that exact current main additively into this published branch; do not rebase or force-push. Reconcile any overlap according to current behavior, re-run the final review/simplification/thread/metadata gates, push, and require exact-new-head CI before squash merge. If main advances again before your merge, repeat the normal merge-from-main and exact-head gate. The prior hold is released; continue the goal through verified merge and post-merge main health.

Preserve authoritative quantization scale factors and execution precision, fail closed on ambiguous or malformed lineage, and keep anomaly clustering tied to byte-identical exports. Add regression coverage for operand orientation, graph bounds, parameter layouts, aggregation identity, and metadata consistency.
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Addressed the current-head review blockers in ede50b3 (on top of the additive main merge 91b8739):

  • integer-output scale inference now follows only the unique live authoritative path, rejects raw/scaled aliases and unresolved/dynamic/nested/cast scale expressions, preserves ordered/repeated scalar and vector factors, and retains the dtype at each scale application;
  • streaming and aggregate clustering now require non-empty SHA-256 identity and preserve distinct analysis/consumer/quantization groups;
  • added fail-closed shape/dtype checks, left-operand and batched layouts, bounded graph indexing, truthful fallback metadata, and ConvInteger/QLinearConv coverage.

Validation: ruff clean (422 files), full mypy clean (477 files), ONNX scanner 341 passed, weight-distribution scanner 130 passed/1 TensorFlow skip, and the pinned MiniLM qint8 regression passed at 72/72/72. Broad-suite stops reproduced unchanged on current main in unrelated static-getattr tests; exact-head CI is running and remains the merge gate.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Final coordination gates for ede50b34:

  1. test(pickle): keep protocol-0 storage tails fail closed #1673 is the next serial landing. Keep this PR's current CI/review work moving, but hold the final merge until test(pickle): keep protocol-0 storage tails fail closed #1673 reports MERGED; then merge the newest origin/main additively and rerun exact-head gates. The current body still describes failures on an older main and must be refreshed after that sync.
  2. The current body records net +1007 production lines, a material exception to the user's LOC/complexity constraint. Run a deletion-first pass over the final synced diff: inventory the largest lineage/routing components, test whether existing bounded graph/quantization helpers can replace duplicate accessors or state, and remove historical review scaffolding. If it cannot shrink safely, add a concise per-component justification and fail-closed/budget invariant for the retained complexity. Do not code-golf safeguards; prove the size.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ede50b34b3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/onnx_scanner.py Outdated
Comment thread modelaudit/scanners/onnx_scanner.py Outdated
Keep QuantizeLinear-derived weights eligible for coverage failures and account for copies created by post-dequantization transforms.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 440933e0e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/onnx_scanner.py Outdated
Comment thread modelaudit/scanners/onnx_scanner.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Fleet sequencing update: post-#1679 main Python CI run 27462464942 is red on two Linux 3.11 false-positive tests. A dedicated goal-mode worker owns that exact current-main repair. Continue review, simplification, conflict resolution, targeted/full validation, title/body cleanup, and comment handling for this PR, but do not perform the final merge while main is red.

After the CI repair lands, #1673 remains the first test-only serial gate. Before this PR's eventual merge, fetch and additively merge the newest origin/main, resolve overlap without rebasing or force-pushing, and require exact-new-head review/CI. Keep the goal active across the coordination hold.

Trace integer scale paths across bounded bias adds and preserve fail-closed lineage for standard DynamicQuantizeLinear weight outputs.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 15f3b04d70

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/onnx_scanner.py Outdated
Comment thread modelaudit/scanners/onnx_scanner.py
Comment thread modelaudit/scanners/onnx_scanner.py
Comment thread modelaudit/scanners/onnx_scanner.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Coordinator review gate: four fresh unresolved quantized-ONNX threads remain, including two P1 fail-open lineage cases. Fix or rebut every thread with regressions, reply/resolve, then rerun exact-head gates. Final merge remains held behind #1680.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Corrected the exact-head shared-sidecar expectation at 1b2d1299: the two fixtures are byte-identical ONNX packages, so package content identity is recorded once while sidecar bytes are still counted once. @codex review exact head 1b2d1299; stop and report stale if the head moves.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Another round soon, please!

Reviewed commit: 1b2d129910

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Exact-head CI at 1b2d1299 had one unrelated Windows-only failure in tests/scanners/test_joblib_scanner_codecs.py::test_scan_accepts_security_like_bytes_inside_numpy_array[pickle.loads] after 15,395 passes; rerunning failed exact-head jobs without widening the ONNX/identity/cache diff.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Addressed the three exact-head Codex findings additively at f7bdc186: retain a bare digest only when a non-package source shares consumed ONNX bytes; collapse/update matching anomaly checks with clustered issues; move direct external-data byte accounting after max-size/total-size deferral. Added focused compact regressions. @codex review exact head f7bdc186; stop and report stale if the head moves.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f7bdc1866c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/core.py
Comment thread modelaudit/core.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Exact-head ONNX CI validated the clustered check count but showed anomaly stats intentionally exclude INFO findings; removed only that over-specific new assertion at 65e71f18. @codex review exact head 65e71f18; stop and report stale if the head moves.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Addressed the next exact-head Codex findings additively at 89684f97: clustered raw and pre-consolidated anomaly check contexts consistently with clustered issues, and moved directory external-data byte accounting after max-total-size deferral. Added compact consolidated-check and directory-budget regressions. @codex review exact head 89684f97; stop and report stale if the head moves.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 89684f97b3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/core.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Exact-head ONNX CI showed the new directory budget fixture used a separately scheduled .bin sidecar; corrected only the fixture at bbc76c2a to use a declared-but-skipped external-data sidecar, matching the guarded directory path. @codex review exact head bbc76c2a; stop and report stale if the head moves.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Addressed the exact-head consolidated-finding Codex followup additively at c9678061: nested consolidated findings re-key by bounded ONNX anomaly context/content identity rather than their generic consolidated check message. @codex review exact head c9678061; stop and report stale if the head moves.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c967806101

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/models.py Outdated
Comment thread modelaudit/core.py Outdated
Comment thread modelaudit/core.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Exact-head ONNX CI showed my new max-total-size fixture scheduled its sidecar independently rather than reaching the guarded skipped-sidecar branch; dropped only that invalid new fixture at 22b782f1 while retaining the Codex-requested directory accounting fix. @codex review exact head 22b782f1; stop and report stale if the head moves.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Addressed the next exact-head Codex findings additively at 6992f49a: include neuron_index in bounded ONNX anomaly context; preserve a single DVC directory output's direct aggregate identity without re-aggregating; discard streamed ONNX sidecar digests only when that exact sidecar instance/target had been recorded. Added focused model, DVC, and streaming regressions. @codex review exact head 6992f49a; stop and report stale if the head moves.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

Reviewed commit: 6992f49a93

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Exact-head ONNX CI exposed the existing refetched-sidecar expectation and the Codex-requested skipped directory sidecar accounting; fixed additively at 27493341 by binding streamed sidecar discard to recorded path/instance/target and updating the superseded skipped-sidecar byte expectation. @codex review exact head 27493341; stop and report stale if the head moves.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Hooray!

Reviewed commit: 27493341a1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Ready for human approval: exact head 27493341 has green required CI, exact-head Codex review clean, unresolved actionable threads zero, title/body/changelog updated, and no admin bypass will be used. Please approve; I will merge once a human approval is present.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review exact head b61ceca

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b61ceca490

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/core.py
Comment on lines +5848 to +5851
if is_dvc_pointer and len(target_files) == 1:
single_dvc_directory_content_hash = nested_result.content_hash
elif nested_result.content_hash and nested_result.content_hash not in file_hashes:
file_hashes.append(nested_result.content_hash)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve flat DVC directory hashes

When a .dvc file resolves to multiple directory outputs, this branch appends each nested directory's already-aggregated content_hash and the parent later aggregates those aggregate hashes. compute_aggregate_hash is not associative, so the same two output directories produce a different top-level hash when scanned via one DVC pointer than when scanned directly or flattened as before, which breaks content-based deduplication for multi-output DVC models. The single-directory shortcut above preserves direct-scan parity only for len(target_files) == 1; the multi-directory path should still flatten the nested file/package hashes instead of hashing nested aggregates.

Useful? React with 👍 / 👎.

Comment thread modelaudit/core.py
Comment on lines +6023 to +6024
onnx_external_data_bytes_scanned += external_data_size
top_level_hashed_bytes += external_data_size

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid double-counting self-referential ONNX sidecars

When an ONNX external_data location resolves back to the primary .onnx file itself, the primary file has already been added to top_level_hashed_bytes before this loop, but these lines add the same file's size again as if it were a separate sidecar. That makes direct scans of self-referential ONNX models report doubled scanned bytes and can incorrectly trip max_total_size for a single physical file whose real size is within the limit; skip the external-data accounting when its identity matches the primary model identity.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant