Skip to content

Qwen-Image diffusers PTQ: FP8 / NVFP4 / NVFP4-SVDQuant HF checkpoints#1706

Open
jingyu-ml wants to merge 16 commits into
mainfrom
feature/qwen-image-svdquant-nvfp4
Open

Qwen-Image diffusers PTQ: FP8 / NVFP4 / NVFP4-SVDQuant HF checkpoints#1706
jingyu-ml wants to merge 16 commits into
mainfrom
feature/qwen-image-svdquant-nvfp4

Conversation

@jingyu-ml

@jingyu-ml jingyu-ml commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Type of change: New feature

Adds Qwen-Image (Qwen/Qwen-Image, QwenImageTransformer2DModel) to the diffusers quantization example and exports HuggingFace checkpoints in three precisions — FP8, NVFP4, and NVFP4 + SVDQuant — through the unified HF export.

  • Registers --model qwen-image (lazy diffusers import; no trust_remote_code).
  • Transformer-block-range recipe: quantizes only the linears under transformer_blocks, keeping the first 2 / last 2 blocks (and everything outside transformer_blocks) in original precision. Applied before calibration so SVDQuant never mutates the excluded blocks. Expressed with the top-level enable QuantizerCfgEntry field (disable-all → re-enable transformer_blocks → disable first/last-N).
  • SVDQuant export (AWQ-style): promotes quantizer-owned tensors to clean module-level safetensors keys at export time — weight_quantizer.svdquant_lora_a/b → <module>.svdquant_lora_a/b and input_quantizer._pre_quant_scale → <module>.pre_quant_scale — with a documented NVFP4_SVD quantization_config (group_size, has_zero_point: false, pre_quant_scale: true, lora_rank). Core SVDQuant quantization code (modelopt/torch/quantization) is unchanged.
  • Shared export-path fixes (validated across SDXL / Flux / Wan2.2): lazy onnx_graphsurgeon import (only needed for --onnx-dir); single-file save for large transformers (the layerwise-metadata post-processing does not support sharded safetensors); and hide_quantizers_from_state_dict now strips quantizer state from all modules so norm-layer input quantizers no longer leak input_quantizer._amax.

Usage

python examples/diffusers/quantization/quantize.py \
    --model qwen-image --override-model-path <Qwen-Image> --model-dtype BFloat16 \
    --format fp4 --quant-algo svdquant --lowrank 32 \
    --calib-size 64 --n-steps 20 \
    --hf-ckpt-dir <out> --sanity-image-path <out>/sanity.png
# FP8:   --format fp8 --quant-algo max
# NVFP4: --format fp4 --quant-algo max

Testing

  • Focused unit + example tests pass on GB200 (sm_100): block-range recipe, NVFP4_SVD config schema, SVDQuant forward/fold (LoRA stays on weight_quantizer), Qwen dummy-input / strict-QKV-fusion / promotion, pipeline loading, and the diffusers HF-export test for Qwen FP8 / NVFP4 / SVDQuant.
  • Full tests/examples/diffusers/test_export_diffusers_hf_ckpt.py is green (SDXL, Flux, Qwen, Wan2.2) — confirms the shared export changes do not regress other models.
  • End-to-end on the real Qwen/Qwen-Image (~20B): all three formats export valid HF checkpoints — only transformer_blocks 2..57 quantized, nothing outside, no quantizer/_amax leak, correct weight_scale(_2)/input_scale, promoted SVDQuant keys (rank-consistent shapes), and the expected quantization_config — plus a quantized-inference sanity image.

Before your PR is "Ready for review"

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: ✅
  • Did you update Changelog?: ❌
  • Did you get Claude approval on this PR?: ❌

Additional Information

All changes are confined to the diffusers example (examples/diffusers/quantization) plus the shared export path (modelopt/torch/export); the core quantization library is untouched.

Follow-up (next step): fused-QKV SVDQuant for sglang / Nunchaku

This export keeps attention q/k/v (and add_q/k/v_proj) as separate projections — the diffusers-native layout. That matches sglang's bf16 / FP8 / plain-NVFP4 paths (which also keep QKV separate) and ModelOpt/TRT-LLM consumers, so those load 1:1.

sglang's NVFP4-SVDQuant (Nunchaku) path, however, builds a fused to_qkv with a single fused rank-r LoRA in Nunchaku-native format (proj_down/proj_up, smooth_factor, wscales/wtscale). Our per-projection tensors (svdquant_lora_a/b + pre_quant_scale; three independent rank-r decompositions) are not directly loadable there — and cannot be fused at load time, because the fp16 weight residual needed to derive a single fused rank-r is not preserved after export.

Planned next step: an opt-in fused-QKV SVDQuant export mode that fuses q/k/v before SVDQuant calibration (yielding one rank-r over the fused weight) and emits a Nunchaku-compatible layout, enabling lower-latency fused-QKV inference in sglang. Tracked as a separate follow-up.

jingyu-ml and others added 12 commits June 11, 2026 16:50
Register Qwen/Qwen-Image as a supported model in the diffusers
quantization example:
- ModelType.QWEN_IMAGE and lazy-imported QwenImagePipeline (so the
  example still imports on older diffusers).
- MODEL_REGISTRY / MODEL_PIPELINE / MODEL_DEFAULTS entries
  (backbone="transformer", text-to-image calibration dataset).
- An actionable ImportError when the installed diffusers lacks Qwen
  classes, instead of an opaque failure.
- filter_func_qwen_image: quantize only transformer_blocks, keeping the
  first two and last two of the 60 blocks (and everything outside
  transformer_blocks) in original precision.

Enables the plain FP8/NVFP4 export path for Qwen-Image. Core SVDQuant
code is unchanged. (Qwen-Image SVDQuant checkpoint work, RLCR round 0 / M1.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
…harness

Implements the Qwen-Image NVFP4/FP8/SVDQuant diffusers quantization feature
(RLCR round 0 / M2-M5), keeping core SVDQuant code unchanged:

M2 (recipe): build_block_range_quant_cfg() emits ordered quant_cfg rules
(disable-all -> enable *.transformer_blocks.* -> disable first/last-N), applied
pre-calibration in Quantizer.get_quant_config so SVDQuant never mutates the
excluded blocks. Driven by a MODEL_DEFAULTS["block_range"] entry for Qwen-Image
(exclude first 2 / last 2; n derived from the model; n>=first+last+1 enforced).

M3 (export): _export_diffusers_checkpoint now promotes quantizer-owned tensors
to clean module-level safetensors keys before hide_quantizers_from_state_dict
(diffusers path only; the transformers path keeps its postprocess_state_dict
rename): input_quantizer._pre_quant_scale -> <module>.pre_quant_scale (AWQ key),
weight_quantizer.svdquant_lora_a/b -> <module>.svdquant_lora_a/b. Adds an
NVFP4_SVD branch to convert_hf_config (modeled on nvfp4_awq: pre_quant_scale +
lora_rank), and process_layer_quant_config now flags SVDQuant with
pre_quant_scale=True. This also resolves the diffusers pre_quant_scale TODO for
AWQ-style exports.

M4 (tests): unit tests for the block-range recipe (first/last-2 exclusion,
n>=6 validation) and the NVFP4_SVD HF config conversion.

M5 (harness): quantize.py --sanity-image-path (in-memory quantized-inference
image, pre-export) + examples/diffusers/quantization/qwen_image_svdquant/
{run_qwen_image_quantization.sh, README.md} (parameterized container/model/
export flow for FP8/NVFP4/SVDQuant).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
… tests

Addresses the round-0 Codex review (RLCR round 1):

Blocking fixes:
- convert_hf_config: NVFP4_SVD config groups now keep `has_zero_point: False`
  (both convert_hf_quant_config_format and _quant_algo_to_group_config); asserted
  in the unit test.
- build_block_range_quant_cfg: minimum is now first+last+2 (>=2 quantized middle
  blocks; n>=6 for the 2+2 Qwen recipe); recipe test rejects 5/4/3-block models.
- quantize.py --sanity-image-path failures are now fatal (re-raise -> non-zero
  exit) so the harness cannot report success without the image; the harness also
  verifies sanity.png + safetensors + config.json exist per format.

Qwen export enablement:
- diffusers_utils.generate_diffusion_dummy_inputs: add a QwenImageTransformer2DModel
  branch (packed latents [B,(H//2)(W//2),C], encoder_hidden_states_mask, img_shapes,
  txt_seq_lens, optional guidance, continuous timestep).
- unified_export_hf._fuse_qkv_linears_diffusion gains strict=; Qwen QKV fusion now
  fails hard instead of silently skipping. Promotion buffers now overwrite on
  re-export. create_pipeline_from gives the same actionable Qwen import error.

Tests:
- New tests/unit/torch/quantization/test_svdquant_forward_fold.py: LoRA stays on
  weight_quantizer, forward includes a nonzero residual, fold_weight folds it and
  drops the buffers (existing test_svdquant_lora_weights left unmodified).

Deferred to Round 2 / cluster: tiny Qwen2_5_VL fixture + full diffusers e2e export
test (needs a Qwen-capable diffusers + GPU); the actual AC-7 checkpoint run.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
…minology

Round 2 (addresses round-1 Codex review: the round-1 code had no direct test
coverage). Adds tests/unit/torch/export/test_diffusers_qwen_export.py:
- Qwen dummy inputs: generate_diffusion_dummy_inputs builds the expected keys for
  a real tiny QwenImageTransformer2DModel, and the generated dummy forward runs
  on it (this is what catches any wrong shape/kwarg in the dummy-input builder).
- Strict fusion: _fuse_qkv_linears_diffusion(strict=True) re-raises on a failing
  dummy forward; strict=False does not.
- Structural export: _promote_quantizer_tensors_to_module promotes SVDQuant LoRA
  + pre_quant_scale to clean module keys that survive hide_quantizers_from_state_dict
  (promoted <module>.svdquant_lora_a/b + <module>.pre_quant_scale present;
  weight_quantizer / input_quantizer keys absent), on a calibrated tiny SVDQuant MLP.

Also removes plan/workflow terminology (DEC-5, "pre-calibration") from source and
test comments per the plan code-style note.

Still pending (Round 3 / cluster): the full tiny Qwen pipeline fixture + e2e
subprocess export test (needs diffusers' tokenizer/text-encoder construction and
a GPU) and the AC-7 cluster run.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Round 3 (addresses round-2 Codex review):

- Fix the tiny Qwen-Image pipeline fixture (tests/_test_utils/torch/diffusers_models.py):
  build the Qwen2.5-VL text encoder inline from a tiny Qwen2_5_VLConfig (no Hub
  model load; the previous hf-internal-testing/...Qwen2_5_VL id does not exist),
  load the tokenizer from the tiny ...Qwen2VL id diffusers' own fast test uses,
  build the transformer with num_layers=6 (so the corrected first-2/last-2
  block-range recipe, which needs >=6 blocks, is valid) and joint_attention_dim=16
  matching the text encoder hidden_size, and a z_dim=4 VAE. Mirrors diffusers'
  QwenImagePipelineFastTests.get_dummy_components.

- Add Qwen FP8 / NVFP4 / NVFP4-SVDQuant cases to test_export_diffusers_hf_ckpt.py
  using the tiny fixture. The test opens transformer/config.json and the exported
  safetensors and asserts: quant_method=modelopt; no weight_quantizer /
  input_quantizer._amax keys; for SVDQuant, promoted <module>.svdquant_lora_a/b +
  <module>.pre_quant_scale keys, config group pre_quant_scale/has_zero_point/
  lora_rank, and non-empty ignore (excluded blocks); for plain formats, weight_scale.
  GPU/diffusers skip-guarded.

- Drop remaining workflow terminology (Step 4.5, before-calibration) from the
  comments I introduced.

Still cluster-only (no GPU here): executing these tests and the AC-7 harness run.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
…ep comments

Round 4 (addresses round-3 Codex review):

- Offline tiny Qwen tokenizer: _build_local_qwen2_tokenizer builds a deterministic
  byte-level Qwen2 tokenizer locally (GPT-2 byte->unicode vocab + Qwen specials,
  empty merges) instead of a Hub load; removes the tokenizer-unavailable skip path.

- Strengthen test_qwen_image_hf_ckpt_export: assert equal module-prefix sets for
  .svdquant_lora_a/.svdquant_lora_b/.pre_quant_scale; promoted linears are a subset
  of weight-scaled linears; only the middle blocks {2,3} of 6 are quantized (first-2/
  last-2 excluded); lora_a=[rank,in]/lora_b=[out,rank] with rank == --lowrank (8);
  NVFP4 weight_scale_2 present; exact config (quant_algo=NVFP4_SVD, lora_rank=8,
  pre_quant_scale=True, has_zero_point=False, non-empty ignore).

- Remove the remaining "Step N:" workflow comments from unified_export_hf.py
  (the round-3 "grep clean" claim was wrong; verified clean across the whole file).

Still cluster-only (no GPU/torch/diffusers here): executing these tests and the
AC-7 harness run.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
…port test

Round 5 (addresses round-4 Codex review, which found a regression I introduced):

- The round-4 edit inserted the _module_prefixes/_block_indices helpers between
  @pytest.mark.parametrize("qwen_model", ...) and test_qwen_image_hf_ckpt_export,
  so the decorator was attached to the helper and the test would request an
  undefined qwen_model fixture. Moved the helpers/constants above the decorator so
  it directly decorates the test (verified via ast: the test now carries the
  qwen_model parametrization and the helper is undecorated).

- Tightened SVDQuant assertions: require a_prefixes == b_prefixes == pqs_prefixes
  == weight_scale_prefixes (every quantized linear is promoted, no gaps), and
  assert every quantized prefix is under transformer_blocks (nothing outside is
  quantized), in addition to the {2,3}-only block check.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Round 6 (round-5 review found no code blocker; only the queued docstring nit):
the create_tiny_qwen_image_pipeline_dir docstring still said the tokenizer was
fetched from the Hub, but Round 4 switched it to a local offline build
(_build_local_qwen2_tokenizer). Updated the wording to "fully offline".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
…tale docs

Round 7 (addresses round-6 Codex review's two missing-coverage items):

- AC-2.2 SVDQuant immutability test (test_qwen_block_range_recipe.py): builds a
  6-block backbone, snapshots the excluded first/last block linear weights, runs
  SVDQuant via build_block_range_quant_cfg, and asserts the excluded blocks'
  weights are bit-identical (never calibrated) with no LoRA, while the middle
  blocks {2,3} receive LoRA and have their weights modified.
- AC-1 negative-loading tests (new test_qwen_pipeline_loading.py): monkeypatch
  MODEL_PIPELINE[QWEN_IMAGE]=None and assert the actionable ImportError; a fake
  pipeline asserts create_pipeline does not pass trust_remote_code.

Stale-doc cleanups: the resolved pre_quant_scale TODO wording in
unified_export_hf.py; the build_block_range_quant_cfg docstring (first+last+1 ->
+2); the conftest "SKETCH" wording (the fixture is now a working offline build).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
…e-gate

Round 8 (addresses round-7 Codex review, which verified against the diffusers
source that QwenImageTransformer2DModel.forward has no txt_seq_lens parameter):

- _qwen_inputs no longer passes txt_seq_lens (the real forward signature is
  hidden_states, encoder_hidden_states, encoder_hidden_states_mask, timestep,
  img_shapes, guidance, return_dict). Passing txt_seq_lens would have raised an
  unexpected-keyword error and, because Qwen export uses strict QKV fusion,
  hard-failed the export.
- Signature-gate the dummy inputs: filter to the kwargs the installed model's
  forward actually accepts (via inspect.signature), so diffusers-version drift
  cannot hard-fail strict fusion either.
- Update test_diffusers_qwen_export.py: no longer require txt_seq_lens.
- Remove AC- plan terminology from two test docstrings (code-style note).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
…after export

Round 9 (clears the last queued code item from Codex; no code blockers remain):
_promote_quantizer_tensors_to_module left the temporary <module>.svdquant_lora_a/b
+ <module>.pre_quant_scale buffers on the live module after export. Add
_remove_promoted_quantizer_tensors and call it after each quantized diffusers
component is saved, so the live module is unchanged post-export (repeated export /
module reuse stay correct). The quantizer-owned tensors are untouched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
…dquant)

Validated end-to-end on GB200 against the real Qwen/Qwen-Image: all three
formats export correct HF checkpoints (only transformer_blocks 2..57; nothing
outside), no quantizer-state leak, and the focused tests pass.

- models_utils: build_block_range_quant_cfg now uses the top-level enable
  QuantizerCfgEntry field (a None cfg retains the base preset's params) instead
  of nesting cfg.enable, which the QuantizerAttributeConfig validator
  rejects/mis-applies (the old form left every block quantized).
- quantize.py: import onnx_utils.export lazily (only needed for --onnx-dir;
  avoids a hard onnx_graphsurgeon dependency), and pass max_shard_size so the
  ~20B transformer saves as a single safetensors -- the unified export's
  layerwise-metadata post-processing does not support sharded files.
- diffusers_utils: hide_quantizers_from_state_dict strips quantizer submodules
  from all modules, not only is_quantlinear, so enabled input quantizers on
  norm layers no longer leak input_quantizer._amax into the checkpoint.
- tests: the tiny QwenImageTransformer2DModel fixture signature-gates its
  kwargs (diffusers 0.38 removed pooled_projection_dim from the constructor);
  the recipe test asserts the corrected top-level enable schema.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
@jingyu-ml jingyu-ml requested review from a team as code owners June 12, 2026 23:18
@copy-pr-bot

copy-pr-bot Bot commented Jun 12, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR extends the diffusers quantization example with Qwen-Image model support, including a selective block-range quantization strategy that excludes transformer edge blocks, SVDQuant low-rank export infrastructure, offline test utilities, and comprehensive validation tests.

Changes

Qwen-Image quantization harness

Layer / File(s) Summary
Model registration and quantization filtering
examples/diffusers/quantization/models_utils.py, examples/diffusers/quantization/utils.py
Adds ModelType.QWEN_IMAGE, wires it to QwenImagePipeline, registers default block-range quantization config excluding first/last transformer blocks, and implements filter_func_qwen_image to enforce this pattern during quantizer application.
Block-range quantization recipe
examples/diffusers/quantization/models_utils.py
Implements build_block_range_quant_cfg to generate ordered quantization rules: disable all quantizers globally, re-enable under specified block module, then disable specific first/last blocks, with validation that minimum middle blocks remain quantized.
Quantization pipeline integration
examples/diffusers/quantization/quantize.py
Integrates block-range recipe into Quantizer.get_quant_config, adds --sanity-image-path CLI option for post-quantization validation, sets 200GB default shard size for transformer export, and lazy-imports optional ONNX dependencies.
Pipeline loading and error detection
examples/diffusers/quantization/pipeline_manager.py
Raises targeted ImportError for missing QwenImagePipeline in both create_pipeline_from and create_pipeline paths, with guidance to upgrade diffusers.
SVDQuant algorithm and config conversion
modelopt/torch/export/convert_hf_config.py, modelopt/torch/export/quant_utils.py
Adds NVFP4_SVD support to HuggingFace config conversion with pre_quant_scale enabled, has_zero_point disabled, and optional lora_rank injection from source config.
Export utilities for Qwen and SVDQuant
modelopt/torch/export/diffusers_utils.py, modelopt/torch/export/unified_export_hf.py
Adds Qwen-specific dummy input builder with signature-based kwarg filtering, implements quantizer tensor promotion/removal to preserve SVDQuant LoRA buffers through Diffusers' state_dict hiding, broadens quantizer hiding to all modules, and adds strict QKV fusion mode for component-specific error handling.
Offline test fixture and utilities
tests/_test_utils/torch/diffusers_models.py, tests/examples/diffusers/conftest.py
Builds offline Qwen2 tokenizer from synthetic vocab.json/merges.txt, rewrites create_tiny_qwen_image_pipeline_dir to construct text encoder and VAE from minimal configs without Hub access, and filters transformer constructor kwargs to avoid signature mismatches.
Checkpoint export and promotion tests
tests/examples/diffusers/test_export_diffusers_hf_ckpt.py, tests/unit/torch/export/test_export_diffusers.py
Tests SVDQuant tensor promotion, verifies exported checkpoints contain only quantized transformer blocks with expected LoRA/scale tensors, asserts no quantizer state leakage, and validates config field consistency.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

cherry-pick-0.45.0

Suggested reviewers

  • realAsma
  • cjluo-nv
  • sugunav14
  • meenchen
🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 51.39% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Qwen-Image diffusers PTQ: FP8 / NVFP4 / NVFP4-SVDQuant HF checkpoints' accurately captures the main change: adding Qwen-Image model support to diffusers quantization with support for FP8, NVFP4, and SVDQuant export formats.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed PR diff contains no matches for torch.load weights_only=False, numpy.load allow_pickle=True, trust_remote_code=True, eval/exec, or '# nosec' in the changed files.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/qwen-image-svdquant-nvfp4

Comment @coderabbitai help to get the list of available commands and usage tips.

@jingyu-ml jingyu-ml marked this pull request as draft June 12, 2026 23:19
run_qwen_image_quantization.sh and its README are cluster-specific
experiment/operator scripts (hard-coded /lustre paths) that do not belong in
the upstream diffusers example. The feature itself (model registration,
block-range recipe, FP8/NVFP4/SVDQuant export) is covered by the committed
tests. The scripts are kept locally outside the repo.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1706/

Built to branch gh-pages at 2026-06-14 00:11 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@jingyu-ml jingyu-ml changed the title Feature/qwen image svdquant nvfp4 Qwen-Image diffusers PTQ: FP8 / NVFP4 / NVFP4-SVDQuant HF checkpoints Jun 12, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
modelopt/torch/export/unified_export_hf.py (1)

1174-1221: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Always clean up promoted buffers with try/finally.

On Line 1174, promoted export buffers are added, but cleanup on Line 1219-1221 runs only on the success path. Any exception in save/postprocess/config update leaves the live component mutated (pre_quant_scale / svdquant_lora_* buffers lingering).

Proposed fix
-            _promote_quantizer_tensors_to_module(component)
-
-            # Build quantization config
-            quant_config = get_quant_config(component, is_modelopt_qlora=False)
-            if quant_config:
-                quantization_details = quant_config.get("quantization", {})
-                # Record the SVDQuant low-rank size so consumers know the LoRA shape.
-                if quantization_details.get("quant_algo") == "NVFP4_SVD":
-                    svdquant_rank = _detect_svdquant_rank(component)
-                    if svdquant_rank is not None:
-                        quantization_details["lora_rank"] = svdquant_rank
-            hf_quant_config = convert_hf_quant_config_format(quant_config) if quant_config else None
-
-            # Save the component
-            # - diffusers ModelMixin.save_pretrained does NOT accept state_dict parameter
-            # - for non-diffusers modules (e.g., LTX-2 transformer), fall back to torch.save
-            if hasattr(component, "save_pretrained"):
-                with hide_quantizers_from_state_dict(component):
-                    component.save_pretrained(component_export_dir, max_shard_size=max_shard_size)
-            else:
-                with hide_quantizers_from_state_dict(component):
-                    _save_component_state_dict_safetensors(component, component_export_dir)
-
-            # Post-process — merge, metadata, padding, swizzle
-            _postprocess_safetensors(
-                component_export_dir,
-                pipe,
-                hf_quant_config=hf_quant_config,
-                **kwargs,
-            )
-
-            # Update config.json with quantization info
-            if hf_quant_config is not None:
-                config_path = component_export_dir / "config.json"
-                if config_path.exists():
-                    with open(config_path) as file:
-                        config_data = json.load(file)
-                    config_data["quantization_config"] = hf_quant_config
-                    with open(config_path, "w") as file:
-                        json.dump(config_data, file, indent=4)
-
-            # Drop the temporary promoted export buffers so the live module is
-            # unchanged after export (supports repeated export / module reuse).
-            _remove_promoted_quantizer_tensors(component)
+            _promote_quantizer_tensors_to_module(component)
+            try:
+                # Build quantization config
+                quant_config = get_quant_config(component, is_modelopt_qlora=False)
+                if quant_config:
+                    quantization_details = quant_config.get("quantization", {})
+                    if quantization_details.get("quant_algo") == "NVFP4_SVD":
+                        svdquant_rank = _detect_svdquant_rank(component)
+                        if svdquant_rank is not None:
+                            quantization_details["lora_rank"] = svdquant_rank
+                hf_quant_config = convert_hf_quant_config_format(quant_config) if quant_config else None
+
+                if hasattr(component, "save_pretrained"):
+                    with hide_quantizers_from_state_dict(component):
+                        component.save_pretrained(component_export_dir, max_shard_size=max_shard_size)
+                else:
+                    with hide_quantizers_from_state_dict(component):
+                        _save_component_state_dict_safetensors(component, component_export_dir)
+
+                _postprocess_safetensors(
+                    component_export_dir,
+                    pipe,
+                    hf_quant_config=hf_quant_config,
+                    **kwargs,
+                )
+
+                if hf_quant_config is not None:
+                    config_path = component_export_dir / "config.json"
+                    if config_path.exists():
+                        with open(config_path) as file:
+                            config_data = json.load(file)
+                        config_data["quantization_config"] = hf_quant_config
+                        with open(config_path, "w") as file:
+                            json.dump(config_data, file, indent=4)
+            finally:
+                _remove_promoted_quantizer_tensors(component)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/export/unified_export_hf.py` around lines 1174 - 1221, You
promote quantizer-owned tensors with _promote_quantizer_tensors_to_module but
only call _remove_promoted_quantizer_tensors on the success path, so exceptions
during save/postprocess/config update leave the module mutated; wrap the work
that occurs after promotion (the save path using hide_quantizers_from_state_dict
+ component.save_pretrained or _save_component_state_dict_safetensors,
_postprocess_safetensors, and the config.json update that uses hf_quant_config)
in a try/finally and call _remove_promoted_quantizer_tensors(component) in the
finally so cleanup always runs; preserve and re-raise any exception after
cleanup to avoid swallowing errors.
examples/diffusers/quantization/quantize.py (1)

111-121: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Make the Qwen block-range mask backbone-aware and use a single source of truth.

get_quant_config() always injects MODEL_DEFAULTS[QWEN_IMAGE]["block_range"], and quantize_model() always follows with get_model_filter_func(). For Qwen-Image that creates two concrete failure modes: --backbone transformer vae will raise when the VAE path hits build_block_range_quant_cfg() with no transformer_blocks, and any local override checkpoint whose transformer depth is not exactly 60 will calibrate with one exclusion mask but be post-disabled with the hard-coded 60-block mask from examples/diffusers/quantization/utils.py. Please gate the recipe/filter to the transformer backbone and derive both from the loaded backbone instead of keeping two independent masks.

Also applies to: 171-191, 223-233, 696-709

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/diffusers/quantization/quantize.py` around lines 111 - 121,
get_quant_config currently always injects
MODEL_DEFAULTS[QWEN_IMAGE]["block_range"] and
quantize_model/get_model_filter_func apply a separate hard-coded 60-block mask,
causing mismatch when backbone != transformer or transformer depth != 60; fix by
making the Qwen block-range mask backbone-aware and deriving it from the loaded
backbone (e.g., transformer_blocks, num_layers or backbone.config.*) as the
single source of truth: update get_quant_config to consult the actual backbone
type and depth and compute block_range via
build_block_range_quant_cfg(backbone_depth) instead of using
MODEL_DEFAULTS[QWEN_IMAGE]["block_range"], and update quantize_model and
get_model_filter_func to use that same computed mask (remove hard-coded masks in
examples/diffusers/quantization/utils.py) so all three locations
(get_quant_config, quantize_model, get_model_filter_func /
build_block_range_quant_cfg) reference the same backbone-derived value.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/diffusers/quantization/quantize.py`:
- Around line 581-589: The CLI currently accepts --sanity-image-path
unconditionally and assumes generated outputs have images, causing late failures
for video/non-image pipelines; update argument validation in quantize.py to
reject --sanity-image-path early when the selected pipeline type is not an image
pipeline: after parsing args (or inside the existing validation function / main
pipeline selection flow), detect the pipeline kind via the pipeline ID or class
name used for inference (the same symbol(s) that decide which pipeline to
instantiate) and raise an error or exit if --sanity-image-path is set but the
pipeline is not one of the known image pipelines (e.g., StableDiffusion/Any
Image* pipelines); apply the same guard for the second occurrence of this block
noted around the other lines so non-image pipelines fail at argument validation
time rather than after a full run.

In
`@examples/diffusers/quantization/qwen_image_svdquant/run_qwen_image_quantization.sh`:
- Around line 39-40: The script claims DRY_RUN previews commands but still
performs side effects and hard-fails on missing tokens; update logic so when
DRY_RUN is set (check DRY_RUN or use a helper dry_run() wrapper) you skip/avoid
any real file checks and mutations: only echo planned actions instead of
performing them, skip the HF_TOKEN_FILE existence/readability checks (do not
exit with error) and skip creating OUTPUT_DIR (do not run mkdir -p) and any file
writes; specifically wrap or conditionalize the HF_TOKEN_FILE checks (the
HF_TOKEN_FILE variable handling) and the mkdir -p or other filesystem operations
that create ${OUTPUT_DIR} so they only execute when DRY_RUN is not set, and
ensure any commands that would modify disk are printed when DRY_RUN=1 rather
than executed.

In `@modelopt/torch/export/unified_export_hf.py`:
- Around line 1024-1035: The code currently returns the first observed SVDQuant
rank via _detect_svdquant_rank by inspecting weight_quantizer.svdquant_lora_a,
which can hide inconsistencies across modules; update the logic to scan all
modules' weight_quantizer.svdquant_lora_a values, collect all unique ranks, and:
if none found return None, if exactly one unique rank use that, otherwise raise
or log an explicit error and refuse to write a single lora_rank metadata value.
Apply this validation before serializing the lora_rank metadata (where lora_rank
is written) so you never serialize an incorrect single rank when multiple
different ranks exist. Ensure you reference the same attributes
(weight_quantizer, svdquant_lora_a) and the _detect_svdquant_rank helper (or
replace it with a function that returns the set/validates) so callers can act on
the validation result.

In `@tests/_test_utils/torch/diffusers_models.py`:
- Line 296: Move the deferred "import inspect" (and any other imports added
inside tests) to the module/top-level in
tests/_test_utils/torch/diffusers_models.py (and the other referenced test
files: tests/examples/diffusers/test_qwen_block_range_recipe.py,
tests/examples/diffusers/test_export_diffusers_hf_ckpt.py,
tests/unit/torch/export/test_diffusers_qwen_export.py) so imports are
module-level by default; if an import truly must be deferred (circular or
optional dependency), keep it but add a one-line comment above the deferred
import explaining the specific reason and link to the offending symbol (e.g.,
the "import inspect" line) so reviewers can verify the justification.

---

Outside diff comments:
In `@examples/diffusers/quantization/quantize.py`:
- Around line 111-121: get_quant_config currently always injects
MODEL_DEFAULTS[QWEN_IMAGE]["block_range"] and
quantize_model/get_model_filter_func apply a separate hard-coded 60-block mask,
causing mismatch when backbone != transformer or transformer depth != 60; fix by
making the Qwen block-range mask backbone-aware and deriving it from the loaded
backbone (e.g., transformer_blocks, num_layers or backbone.config.*) as the
single source of truth: update get_quant_config to consult the actual backbone
type and depth and compute block_range via
build_block_range_quant_cfg(backbone_depth) instead of using
MODEL_DEFAULTS[QWEN_IMAGE]["block_range"], and update quantize_model and
get_model_filter_func to use that same computed mask (remove hard-coded masks in
examples/diffusers/quantization/utils.py) so all three locations
(get_quant_config, quantize_model, get_model_filter_func /
build_block_range_quant_cfg) reference the same backbone-derived value.

In `@modelopt/torch/export/unified_export_hf.py`:
- Around line 1174-1221: You promote quantizer-owned tensors with
_promote_quantizer_tensors_to_module but only call
_remove_promoted_quantizer_tensors on the success path, so exceptions during
save/postprocess/config update leave the module mutated; wrap the work that
occurs after promotion (the save path using hide_quantizers_from_state_dict +
component.save_pretrained or _save_component_state_dict_safetensors,
_postprocess_safetensors, and the config.json update that uses hf_quant_config)
in a try/finally and call _remove_promoted_quantizer_tensors(component) in the
finally so cleanup always runs; preserve and re-raise any exception after
cleanup to avoid swallowing errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c42d794b-9dd7-41d7-ad4c-25c69901c226

📥 Commits

Reviewing files that changed from the base of the PR and between d26c8af and c2250cb.

📒 Files selected for processing (18)
  • examples/diffusers/quantization/models_utils.py
  • examples/diffusers/quantization/pipeline_manager.py
  • examples/diffusers/quantization/quantize.py
  • examples/diffusers/quantization/qwen_image_svdquant/README.md
  • examples/diffusers/quantization/qwen_image_svdquant/run_qwen_image_quantization.sh
  • examples/diffusers/quantization/utils.py
  • modelopt/torch/export/convert_hf_config.py
  • modelopt/torch/export/diffusers_utils.py
  • modelopt/torch/export/quant_utils.py
  • modelopt/torch/export/unified_export_hf.py
  • tests/_test_utils/torch/diffusers_models.py
  • tests/examples/diffusers/conftest.py
  • tests/examples/diffusers/test_export_diffusers_hf_ckpt.py
  • tests/examples/diffusers/test_qwen_block_range_recipe.py
  • tests/examples/diffusers/test_qwen_pipeline_loading.py
  • tests/unit/torch/export/test_convert_hf_config_svdquant.py
  • tests/unit/torch/export/test_diffusers_qwen_export.py
  • tests/unit/torch/quantization/test_svdquant_forward_fold.py

Comment on lines +581 to +589
export_group.add_argument(
"--sanity-image-path",
type=str,
default=None,
help="If set, generate one image from the in-memory quantized pipeline (after "
"quantization, before the weights are packed for export) and save it here. This is "
"a quick functional sanity check of quantized inference; it does NOT reload the "
"exported checkpoint.",
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Reject --sanity-image-path for non-image pipelines at argument validation time.

This block assumes every supported model returns result.images[0], but the same CLI also supports video pipelines (LTX_*, WAN*). Today those runs will burn a full inference pass and then fail late on the save step instead of being rejected at the interface boundary.

Suggested guard
         pipeline_manager.print_quant_summary()

+        if args.sanity_image_path and model_type in {
+            ModelType.LTX_VIDEO_DEV,
+            ModelType.LTX2,
+            ModelType.WAN22_T2V_14b,
+            ModelType.WAN22_T2V_5b,
+        }:
+            parser.error("--sanity-image-path is only supported for image pipelines.")
+
         # Optional functional sanity check: generate one image from the in-memory
         # quantized pipeline. This runs BEFORE export (while weights are still
         # fake-quantized and runnable, not yet packed) and does not reload the

Also applies to: 729-750

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/diffusers/quantization/quantize.py` around lines 581 - 589, The CLI
currently accepts --sanity-image-path unconditionally and assumes generated
outputs have images, causing late failures for video/non-image pipelines; update
argument validation in quantize.py to reject --sanity-image-path early when the
selected pipeline type is not an image pipeline: after parsing args (or inside
the existing validation function / main pipeline selection flow), detect the
pipeline kind via the pipeline ID or class name used for inference (the same
symbol(s) that decide which pipeline to instantiate) and raise an error or exit
if --sanity-image-path is set but the pipeline is not one of the known image
pipelines (e.g., StableDiffusion/Any Image* pipelines); apply the same guard for
the second occurrence of this block noted around the other lines so non-image
pipelines fail at argument validation time rather than after a full run.

Comment on lines +1024 to +1035
def _detect_svdquant_rank(component: nn.Module) -> int | None:
"""Return the SVDQuant low-rank dimension from the first SVDQuant linear, if any.

``svdquant_lora_a`` has shape ``(rank, in_features)``, so its first dimension
is the low-rank size.
"""
for _, sub_module in component.named_modules():
weight_quantizer = getattr(sub_module, "weight_quantizer", None)
lora_a = getattr(weight_quantizer, "svdquant_lora_a", None)
if lora_a is not None:
return int(lora_a.shape[0])
return None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Validate SVDQuant rank consistency before writing lora_rank metadata.

On Line 1024, _detect_svdquant_rank() returns the first observed rank. If different quantized modules carry different svdquant_lora_a ranks, Line 1185-1188 will serialize a single incorrect lora_rank, which can misrepresent the exported checkpoint contract.

Proposed fix
 def _detect_svdquant_rank(component: nn.Module) -> int | None:
@@
-    for _, sub_module in component.named_modules():
+    ranks: set[int] = set()
+    for _, sub_module in component.named_modules():
         weight_quantizer = getattr(sub_module, "weight_quantizer", None)
         lora_a = getattr(weight_quantizer, "svdquant_lora_a", None)
         if lora_a is not None:
-            return int(lora_a.shape[0])
-    return None
+            ranks.add(int(lora_a.shape[0]))
+    if not ranks:
+        return None
+    if len(ranks) != 1:
+        raise ValueError(f"Inconsistent SVDQuant ranks detected across modules: {sorted(ranks)}")
+    return next(iter(ranks))

Also applies to: 1185-1188

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/export/unified_export_hf.py` around lines 1024 - 1035, The
code currently returns the first observed SVDQuant rank via
_detect_svdquant_rank by inspecting weight_quantizer.svdquant_lora_a, which can
hide inconsistencies across modules; update the logic to scan all modules'
weight_quantizer.svdquant_lora_a values, collect all unique ranks, and: if none
found return None, if exactly one unique rank use that, otherwise raise or log
an explicit error and refuse to write a single lora_rank metadata value. Apply
this validation before serializing the lora_rank metadata (where lora_rank is
written) so you never serialize an incorrect single rank when multiple different
ranks exist. Ensure you reference the same attributes (weight_quantizer,
svdquant_lora_a) and the _detect_svdquant_rank helper (or replace it with a
function that returns the set/validates) so callers can act on the validation
result.

Comment thread tests/_test_utils/torch/diffusers_models.py
@jingyu-ml jingyu-ml marked this pull request as ready for review June 12, 2026 23:30
jingyu-ml and others added 2 commits June 12, 2026 16:32
Remove the standalone Qwen test files. The fp8/nvfp4/svdquant cases in
test_export_diffusers_hf_ckpt.py already cover the block-range recipe
(only transformer_blocks 2..57 quantized), the promoted SVDQuant keys +
pre_quant_scale, the NVFP4_SVD quantization_config, and the no-leak check
-- matching how SDXL/Flux/Wan are tested in the same file. Core SVDQuant
forward/fold is unchanged and remains covered by existing upstream tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
…motion

Covers svdquant calibration -> _promote_quantizer_tensors_to_module ->
clean module-level keys (svdquant_lora_a/b, pre_quant_scale) with the
quantizers hidden, plus the post-export cleanup. Runs on CPU in <1s
(INT8_SMOOTHQUANT + svdquant on a tiny linear stack). The full NVFP4
end-to-end check remains test_qwen_image_hf_ckpt_export[qwen_nvfp4_svdquant];
svdquant calibration is already covered by test_calib.py.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unit/torch/export/test_export_diffusers.py`:
- Around line 132-137: Move the local imports into the module-level import
section: take the symbols "copy", "torch.nn as nn", "modelopt.torch.quantization
as mtq", and "hide_quantizers_from_state_dict" and add them with the other
top-of-file imports (after the existing imports around line ~32), then remove
the in-function imports currently present in the test body; this ensures the
imports are executed at collection time and preserves the same symbol names used
in the test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0a915fc4-6111-479e-a3c9-18d6c9db6bd4

📥 Commits

Reviewing files that changed from the base of the PR and between c2250cb and 9b472b2.

📒 Files selected for processing (1)
  • tests/unit/torch/export/test_export_diffusers.py

Comment on lines +132 to +137
import copy

import torch.nn as nn

import modelopt.torch.quantization as mtq
from modelopt.torch.export.diffusers_utils import hide_quantizers_from_state_dict

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Move imports to the top of the file.

Per test coding guidelines, imports inside functions or test methods require explicit justification (e.g., circular imports or optional dependencies like TensorRT-LLM/Megatron-Core). None of these imports (copy, torch.nn, modelopt.torch.quantization, hide_quantizers_from_state_dict) are optional dependencies or resolve circular imports. Moving them to the top ensures import errors surface at collection time instead of mid-test.

📦 Suggested fix

Move these imports to the top of the file with the other imports (after line 32):

 from modelopt.torch.export.convert_hf_config import convert_hf_quant_config_format
 from modelopt.torch.export.diffusers_utils import generate_diffusion_dummy_inputs
 from modelopt.torch.export.unified_export_hf import export_hf_checkpoint
+import copy
+import torch.nn as nn
+import modelopt.torch.quantization as mtq
+from modelopt.torch.export.diffusers_utils import hide_quantizers_from_state_dict

Then remove the in-function imports (lines 132-137).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/torch/export/test_export_diffusers.py` around lines 132 - 137,
Move the local imports into the module-level import section: take the symbols
"copy", "torch.nn as nn", "modelopt.torch.quantization as mtq", and
"hide_quantizers_from_state_dict" and add them with the other top-of-file
imports (after the existing imports around line ~32), then remove the
in-function imports currently present in the test body; this ensures the imports
are executed at collection time and preserves the same symbol names used in the
test.

Source: Coding guidelines

@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 51.35135% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.73%. Comparing base (9f37fe1) to head (4dff6d5).

Files with missing lines Patch % Lines
modelopt/torch/export/diffusers_utils.py 40.74% 16 Missing ⚠️
modelopt/torch/export/convert_hf_config.py 0.00% 10 Missing ⚠️
modelopt/torch/export/unified_export_hf.py 72.97% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1706      +/-   ##
==========================================
- Coverage   77.12%   67.73%   -9.40%     
==========================================
  Files         511      511              
  Lines       56236    56300      +64     
==========================================
- Hits        43370    38132    -5238     
- Misses      12866    18168    +5302     
Flag Coverage Δ
unit 54.39% <51.35%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant