Support Mamba direct conv1d quant export#1716
Conversation
Megatron now stores Mamba conv1d as direct conv1d_weight and conv1d_bias parameters instead of an nn.Conv1d child. ModelOpt export still needs to emit HF conv1d.weight and conv1d.bias keys, and quantization needs the direct weight to participate in the normal weight-only calibration path. The quantizer follows the existing per-parameter naming contract, so conv1d_weight pairs with conv1d_weight_weight_quantizer. The export adapter exposes that internal pair as a standard weight/weight_quantizer view for the existing name remapping and quantized export logic. Constraint: New Megatron Mamba no longer exposes layer.mixer.conv1d. Rejected: Add a Mamba-specific calibration iterator | standard per-weight quantizer naming keeps generic discovery working. Rejected: Change shared weight discovery helpers | unnecessary once the quantizer follows existing naming convention. Confidence: high Scope-risk: moderate Tested: python -m compileall on touched files Tested: git diff --check Tested: uvx ruff@0.12.11 check on touched files Tested: uvx ruff@0.12.11 format --check on touched files Tested: Slurm smoke 215464 passed in container, including export remap, generic weight_attr_names discovery, max_calibrate amax, and quant/export context invocation Not-tested: Full end-to-end HF artifact export/load for a real Nano3 checkpoint Signed-off-by: Meng Xin <mxin@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1716 +/- ##
==========================================
- Coverage 77.12% 77.05% -0.08%
==========================================
Files 511 511
Lines 56236 56297 +61
==========================================
+ Hits 43374 43380 +6
- Misses 12862 12917 +55
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
New Megatron stores Mamba conv1d as direct parameters. ModelOpt adds the direct Conv1d weight quantizer as a dynamic-module attribute, so MCore's native Mamba sharded-state path did not reliably preserve the scalar amax that NVFP4 export needs after checkpoint reload. The Mamba quant module now saves the direct Conv1d quantizer amax in distributed checkpoints, recomputing the weight max from conv1d_weight when reload has dropped calibrator statistics. The Megatron exporter also keeps direct Mamba Conv1d in dtype when its kernel dimension cannot be packed by the configured block quantization format. Constraint: Nano3 uses dynamic block-scale NVFP4; export still requires a calibrated scalar amax. Rejected: Let export tolerate missing amax | missing amax means checkpoint save/restore dropped quantizer state. Rejected: Pack direct Conv1d NVFP4 with kernel size 4 and block size 16 | packed block format requires divisibility. Confidence: medium Scope-risk: moderate Tested: py_compile on touched files; ruff check/format on touched files; file-scoped pre-commit hooks; Nano3 PTQ/train/export smoke job 215752. Not-tested: full ModelOpt GPU test suite. Signed-off-by: Meng Xin <mxin@nvidia.com>
What does this PR do?
Type of change: bug fix
Megatron now stores Mamba conv1d as direct conv1d_weight and conv1d_bias parameters instead of an nn.Conv1d child. ModelOpt export still needs to emit HF conv1d.weight and conv1d.bias keys, and quantization needs the direct weight to participate in the normal weight-only calibration path.
The quantizer follows the existing per-parameter naming contract, so conv1d_weight pairs with conv1d_weight_weight_quantizer. The export adapter exposes that internal pair as a standard weight/weight_quantizer view for the existing name remapping and quantized export logic.
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information