fix(quantization): detect fused MoE experts without act_fn (MiniMax-M3) by Edwardf0t1 · Pull Request #1711 · NVIDIA/Model-Optimizer

Edwardf0t1 · 2026-06-14T02:26:19Z

What does this PR do?

Type of change: Bug fix

Fused MoE expert auto-detection (register_fused_experts_on_the_fly → _is_fused_experts_module) required every fused-expert container to expose an act_fn attribute. MiniMaxM3VLExperts (transformers 5.12.0) applies a custom GPT-OSS-style gated activation (_apply_gate, swiglu with clamp/alpha) between its two F.linear calls instead of exposing act_fn, so it failed detection and was never wrapped as _QuantFusedExperts. Consequences:

routed experts stayed unquantized — an experts-only recipe (*mlp.experts*) matched nothing (only KV-cache quant applied), and
HF export raised NotImplementedError: MoE model with experts type 'MiniMaxM3VLExperts' is not supported in export.

_QuantFusedExperts is activation-agnostic — it only intercepts the two F.linear calls (gate_up then down, in strict alternation) and never touches act_fn. So the act_fn requirement was unnecessary. This PR drops it (keeping the num_experts + 3-D gate_up_proj/down_proj checks), which enables NVFP4/FP8 PTQ and export for MiniMax-M2 / MiniMax-M3.

Usage

# MiniMax-M3 experts-only NVFP4 + FP8 KV now quantizes and exports:
python examples/llm_ptq/hf_ptq.py \
    --pyt_ckpt_path MiniMaxAI/MiniMax-M3 \
    --recipe general/ptq/nvfp4_experts_only_mse-kv_fp8_cast \
    --calib_size 512 --export_path ./m3-nvfp4

Testing

Updated tests/unit/torch/quantization/plugins/test_fused_experts.py: the previous test_module_missing_act_fn_not_detected (which asserted the old, now-incorrect behavior) is replaced by test_module_missing_act_fn_still_detected, asserting that a fused-expert module without act_fn is detected. Negative cases (2-D gate_up, plain nn.Linear) still rejected.
Verified end-to-end on MiniMaxAI/MiniMax-M3 (~428B-A23B, transformers 5.12.0, 8×B300): detection logs Detected fused MoE experts ... of type MiniMaxM3VLExperts, all 57 MoE layers' experts quantize to NVFP4 (21,888 expert weights with scales; 0 on attention/shared-experts/vision tower), KV cache to FP8, and the HF checkpoint exports successfully (854 GB → 260 GB).

Before your PR is "Ready for review"

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ✅ (updated the inverted detection test)
Did you update Changelog?: ✅ (0.46 Bug Fixes)
Did you get Claude approval on this PR?: ❌ (pending)

Additional Information

conversion.py::_normalize_fused_experts_quantizer_name already maps the per-expert gate_up_proj_weight_quantizers.N names to the singular *weight_quantizer form, so existing stock configs/recipes match the newly-detected experts with no recipe changes.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

Bug Fixes
- Fixed MoE expert module detection to support additional configurations, enabling NVFP4/FP8 quantization and HuggingFace export for expert module variants that previously failed to export.

register_fused_experts_on_the_fly skipped fused-expert modules lacking an act_fn attribute. MiniMaxM3VLExperts (transformers 5.12.0) uses a custom GPT-OSS-style gated activation between its two F.linear calls instead of an act_fn attribute, so it was never wrapped as _QuantFusedExperts: routed experts stayed unquantized (an experts-only recipe matched nothing) and HF export failed with NotImplementedError. _QuantFusedExperts is activation-agnostic (it only intercepts the two F.linear calls, gate_up then down), so act_fn is irrelevant to quantization, calibration, and export. Drop the requirement from _is_fused_experts_module. Enables NVFP4/FP8 PTQ + export for MiniMax-M2 / MiniMax-M3. Verified end-to-end: experts-only NVFP4 + FP8 KV PTQ of MiniMaxAI/MiniMax-M3 detects MiniMaxM3VLExperts, quantizes all 57 MoE layers, and exports a valid HF checkpoint. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

coderabbitai · 2026-06-14T02:26:31Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7eb44013-3148-4fdf-bd26-9207d4ed7c4e

📥 Commits

Reviewing files that changed from the base of the PR and between 9f37fe1 and b4c9ce6.

📒 Files selected for processing (3)

CHANGELOG.rst
modelopt/torch/quantization/plugins/huggingface.py
tests/unit/torch/quantization/plugins/test_fused_experts.py

📝 Walkthrough

Walkthrough

_is_fused_experts_module in huggingface.py is updated to drop the act_fn attribute requirement, so modules exposing only gate_up_proj, down_proj, and num_experts now qualify as fused experts. The unit test is updated to assert True for that case, and a changelog entry records the fix.

Changes

Fused expert detection fix

Layer / File(s)	Summary
`_is_fused_experts_module` predicate, test, and changelog `modelopt/torch/quantization/plugins/huggingface.py`, `tests/unit/torch/quantization/plugins/test_fused_experts.py`, `CHANGELOG.rst`	Removes `act_fn` from the boolean checks in `_is_fused_experts_module` and updates its docstring; replaces the test case that expected `False` when `act_fn` is absent with one asserting `True` using a synthetic module of only the required attributes; adds a changelog bug-fix entry.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Suggested reviewers

realAsma
sychen52
meenchen

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly identifies the main fix: removing act_fn requirement for fused MoE expert detection, with the specific affected model (MiniMax-M3) noted.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	PR contains no security anti-patterns: no unsafe torch.load/numpy.load, no hardcoded trust_remote_code=True in code, no eval/exec on external input, no `#nosec` comments, no new unsafe dependencies.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/fused-experts-no-act-fn

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-14T02:30:44Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1711/
Built to branch `gh-pages` at 2026-06-14 02:30 UTC. Preview will be ready when the GitHub Pages deployment is complete.

codecov · 2026-06-14T02:35:02Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.54%. Comparing base (9f37fe1) to head (b4c9ce6).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1711      +/-   ##
==========================================
- Coverage   77.12%   76.54%   -0.59%     
==========================================
  Files         511      511              
  Lines       56236    56236              
==========================================
- Hits        43374    43045     -329     
- Misses      12862    13191     +329

Flag	Coverage Δ
examples	`41.83% <100.00%> (-0.13%)`	⬇️
gpu	`57.75% <100.00%> (-0.63%)`	⬇️
regression	`14.69% <0.00%> (+0.06%)`	⬆️
unit	`54.40% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Edwardf0t1 requested a review from a team as a code owner June 14, 2026 02:26

Edwardf0t1 requested a review from Fridah-nv June 14, 2026 02:26

coderabbitai Bot approved these changes Jun 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(quantization): detect fused MoE experts without act_fn (MiniMax-M3)#1711

fix(quantization): detect fused MoE experts without act_fn (MiniMax-M3)#1711
Edwardf0t1 wants to merge 1 commit into
mainfrom
fix/fused-experts-no-act-fn

Edwardf0t1 commented Jun 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 14, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 14, 2026

Built to branch `gh-pages` at 2026-06-14 02:30 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented Jun 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Edwardf0t1 commented Jun 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 14, 2026

Built to branch gh-pages at 2026-06-14 02:30 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Edwardf0t1 commented Jun 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 14, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-06-14 02:30 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented Jun 14, 2026 •

edited

Loading