Upgrade Dev containers for CICD to latest by kevalmorabia97 · Pull Request #891 · NVIDIA/Model-Optimizer

kevalmorabia97 · 2026-02-13T23:27:47Z

What does this PR do?

Upgrade CICD test containers to latest
Enable torch 2.10 testing in CICD

Testing

CI/CD in this PR should pass

Summary by CodeRabbit

New Features
- Added support for mixed-precision gradient handling with FSDP2.
Documentation
- Updated Linux installation guide with CUDA 13.x support and cupy dependency guidance.
Chores
- Updated CI/CD workflows and test infrastructure to support PyTorch 2.10 and CUDA 13.
- Updated container image versions and test environment configurations.
- Updated TensorRT-LLM version requirements.

coderabbitai · 2026-02-13T23:28:01Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

CI workflows, test matrices, and installation docs updated for newer PyTorch/CUDA/tooling (container tags moved to 26.01, CUDA 13 variants added, Torch 2.10 added). Tests guarded by Megatron/TE checks were mostly removed or relaxed, and a new FSDP2 post-backward patch was added to the quantization trainer.

Changes

Cohort / File(s)	Summary
GitHub Actions workflows `.github/workflows/example_tests.yml`, `.github/workflows/gpu_tests.yml`, `.github/workflows/unit_tests.yml`	Centralized Docker image selection via matrix (default 26.01), bumped container tags to `26.01`, switched GPU job matrix variants from CUDA 12 → CUDA 13, and updated TensorRT/TensorRT-LLM tags. Unit test matrices renamed to use torch210 variants and expanded torch versions.
CI config (tox) `tox.ini`	Aligned tox envs to Torch 2.10 / CUDA 13: added `torch210` CPU envs, replaced `cuda12-gpu` with `cuda13-gpu` variants, updated torchvision mappings and GPU presteps (cupy-cuda13x, fast-hadamard-transform install).
Docs `docs/source/getting_started/_installation_for_Linux.rst`	Updated Linux installation requirements to include CUDA 12.x and 13.x, relaxed TensorRT-LLM constraint, and added CUDA-specific instructions for `cupy-cuda12x` vs `cupy-cuda13x` installation flows.
Quantization trainer patch `modelopt/torch/quantization/plugins/transformers_trainer.py`	Added `_patch_fsdp2_post_backward()` and invoked it during accelerate/FSDP2 preparation to patch `FSDPParamGroup.post_backward`, normalizing grad dtype handling across mixed-precision scenarios (idempotent wrapper + casting logic).
Test utilities `tests/_test_utils/import_helper.py`	Added `ctypes`/`os` checks to verify TensorRT provider native library load and updated `skip_if_no_megatron` signature to keyword-only `(*, te_required: bool = True, mamba_required: bool = False)` with TE-specific skip logic.
Megatron test guards updated/removed `tests/gpu_megatron/...` (multiple files)	Many test files removed or relaxed `skip_if_no_megatron` guards: some tests now run unconditionally; others drop TE/APEX requirements while preserving mamba checks. Review needed for environments lacking Megatron/TE. (See modified files under `tests/gpu_megatron/...`.)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Merge Conflict Detection	⚠️ Warning	❌ Merge conflicts detected (19 files): ⚔️ `.github/workflows/example_tests.yml` (content) ⚔️ `.github/workflows/gpu_tests.yml` (content) ⚔️ `.github/workflows/unit_tests.yml` (content) ⚔️ `docs/source/getting_started/_installation_for_Linux.rst` (content) ⚔️ `modelopt/torch/export/plugins/mcore_nemotron.py` (content) ⚔️ `modelopt/torch/export/plugins/megatron_importer.py` (content) ⚔️ `modelopt/torch/quantization/plugins/transformers_trainer.py` (content) ⚔️ `tests/_test_utils/import_helper.py` (content) ⚔️ `tests/gpu_megatron/torch/distill/plugins/test_distill_megatron.py` (content) ⚔️ `tests/gpu_megatron/torch/export/test_unified_export_megatron.py` (content) ⚔️ `tests/gpu_megatron/torch/export/test_vllm_fakequant_megatron_export.py` (content) ⚔️ `tests/gpu_megatron/torch/nas/plugins/test_megatron_gpt_dynamic_modules.py` (content) ⚔️ `tests/gpu_megatron/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py` (content) ⚔️ `tests/gpu_megatron/torch/peft/plugins/test_megatron_peft.py` (content) ⚔️ `tests/gpu_megatron/torch/prune/plugins/test_mcore_gpt_minitron_pruning.py` (content) ⚔️ `tests/gpu_megatron/torch/prune/plugins/test_mcore_mamba_minitron_pruning.py` (content) ⚔️ `tests/gpu_megatron/torch/quantization/plugins/test_megatron.py` (content) ⚔️ `tests/gpu_megatron/torch/speculative/plugins/test_speculative_megatron_modules.py` (content) ⚔️ `tox.ini` (content) These conflicts must be resolved before merging into `main`.	Resolve conflicts locally and push changes to this branch.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly addresses the main objective of the PR, which is upgrading CICD test containers to latest versions (PyTorch, CUDA, TensorRT, etc.).
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch kmorabia/upgrade-dev-setup

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

.github/workflows/gpu_tests.yml (1)
86-99: ⚠️ Potential issue | 🔴 Critical

gpu-tests-non-pr still references cuda12 tox environments that were removed from tox.ini.

The PR job matrix (lines 66-68) was updated to cuda13, but the non-PR job matrix (lines 92-94) still uses py312-cuda12-gpu and py312-cuda12-gpu-megatron. Since tox.ini no longer defines cuda12 environments (replaced by cuda13), nightly and on-demand runs will fail with an unknown tox environment error.
🐛 Proposed fix
       matrix:
         include:
-          - example: py312-cuda12-gpu
+          - example: py312-cuda13-gpu
             timeout: 90
-          - example: py312-cuda12-gpu-megatron
+          - example: py312-cuda13-gpu-megatron
             timeout: 120
.github/workflows/unit_tests.yml (1)
79-86: ⚠️ Potential issue | 🔴 Critical

multi-torch matrix includes torch26 which no longer exists in tox.ini.

As noted in the tox.ini review, torch26 was dropped from the tox env pattern. This matrix entry will produce py312-torch26-tf_latest-unit which tox won't recognize. Either add torch26 back to the tox env pattern or remove 26 from this matrix (and consider adding 210 if full coverage of the new version is desired).
🐛 Suggested fix (if torch26 is intentionally dropped)
       matrix:
-        torch: [26, 27, 28, 29]
+        torch: [27, 28, 29, 210]

🤖 Fix all issues with AI agents

In `@tox.ini`:
- Around line 14-21: The tox env pattern
[testenv:{py310,py311,py312}-torch{27,28,29,210}-tf_{min,latest}-unit] no longer
includes torch26 but there is still a deps mapping "torch26:
torchvision~=0.21.0" and the CI matrix (multi-torch) still lists 26; fix by
either (A) restoring 26 into the env factor (change torch{27,28,29,210} to
torch{26,27,28,29,210}) so tox, deps, and the workflow matrix align, or (B)
remove the dead deps line "torch26: torchvision~=0.21.0" and remove the "26"
entry from the multi-torch matrix in unit_tests.yml so the workflow no longer
references an unsupported tox env. Ensure both the tox env pattern and the
workflow matrix + deps block are consistent.

🧹 Nitpick comments (1)

docs/source/getting_started/_installation_for_Linux.rst (1)
129-131: Consider splitting the long instruction for readability.

The single bullet on line 131 packs multiple commands and conditional logic into one line, which can be hard to scan in rendered docs.
📝 Suggested restructure
-**CUDA specific dependencies**
-
-* By default, ``cupy-cuda12x`` is installed for INT4 ONNX quantization. If you have CUDA 13, you need to run ``pip uninstall -y cupy-cuda12x`` and ``pip install cupy-cuda13x`` after installing ``nvidia-modelopt[onnx]``.
+**CUDA specific dependencies**
+
+* By default, ``cupy-cuda12x`` is installed for INT4 ONNX quantization.
+* If you have CUDA 13, replace it after installing ``nvidia-modelopt[onnx]``:
+
+  .. code-block:: shell
+
+      pip uninstall -y cupy-cuda12x
+      pip install cupy-cuda13x

tox.ini

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

codecov · 2026-02-13T23:43:29Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.54%. Comparing base (ae69d5d) to head (4f5a65f).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #891      +/-   ##
==========================================
- Coverage   73.74%   73.54%   -0.21%     
==========================================
  Files         199      205       +6     
  Lines       21163    22000     +837     
==========================================
+ Hits        15606    16179     +573     
- Misses       5557     5821     +264

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kevalmorabia97 · 2026-02-14T01:29:04Z

modelopt/torch/quantization/plugins/transformers_trainer.py

@realAsma please help review this cursor-generate fix for passing llm_qat test on PyTorch 2.10 (pytorch:26.01-py3 container)

Failing test without this fix: https://github.com/NVIDIA/Model-Optimizer/actions/runs/22007571666/job/63595325923?pr=891

[rank0]: RuntimeError: attempting to assign a gradient with dtype 'c10::BFloat16' to a tensor with grad_dtype 'Float'. The gradient must match the tensor's grad_dtype (defaults to the tensor's dtype). You can set the tensor's grad_dtype attribute with a specific dtype, or None to allow any dtype. Set grad_dtype with caution. Diverging the dtypes of a tensor and its gradient may break downstream systems that assume they match.

This error does not seem to be related to Model Optimizer. We should probably raise this to Pytorch of Huggingface. Can we add a note that temp fix and this should be removed in future?

Thanks Asma for the review. Can you please raise an issue with torch / transformers with required details to them? I will add ther issue link here for tracking

kevalmorabia97 · 2026-02-14T01:30:54Z

tests/_test_utils/import_helper.py

 from packaging import version


 def skip_if_no_tensorrt():


@gcunhase @ajrasane please help review this cursor-generated check for skipping the tests/gpu/onnx/test_plugin.py on Cuda 13 container with our current ORT-GPU intallation.

Failure without this skip check: https://github.com/NVIDIA/Model-Optimizer/actions/runs/22007571673/job/63595332370?pr=891#step:7:364

*************** EP Error *************** EP Error /onnxruntime_src/onnxruntime/python/onnxruntime_pybind_state.cc:505 void onnxruntime::python::RegisterTensorRTPluginsAsCustomOps(PySessionOptions&, const onnxruntime::ProviderOptions&) Please install TensorRT libraries as mentioned in the GPU requirements page, make sure they're in the PATH or LD_LIBRARY_PATH, and that your GPU is supported. when using [('TensorrtExecutionProvider', {'trt_max_workspace_size': 85899345920}), ('CUDAExecutionProvider', {'device_id': 0, 'arena_extend_strategy': 'kSameAsRequested'}), ('CPUExecutionProvider', {'arena_extend_strategy': 'kSameAsRequested'})] Falling back to ['CPUExecutionProvider'] and retrying. **************************************** ----------------------------- Captured stderr call ----------------------------- 2026-02-14 01:01:06.659874993 [E:onnxruntime:Default, provider_bridge_ort.cc:2167 TryGetProviderInfo_TensorRT] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1778 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_tensorrt.so with error: libcublas.so.12: cannot open shared object file: No such file or directory

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 · 2026-02-14T01:36:10Z

@coderabbitai full review

coderabbitai · 2026-02-14T01:36:20Z

✅ Actions performed

Full review triggered.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/_test_utils/import_helper.py (1)
100-102: ⚠️ Potential issue | 🟡 Minor

Pre-existing bug: skip message is inverted.

The condition on line 100 skips when the installed version is greater than 1.18, but the message says "is less than required". This is a pre-existing issue unrelated to this PR.
Proposed fix
     if version.parse(installed_version) > version.parse(required_version):
         pytest.skip(
-            f"{package_name} version {installed_version} is less than required {required_version}"
+            f"{package_name} version {installed_version} is greater than {required_version}"
         )

🧹 Nitpick comments (1)

modelopt/torch/quantization/plugins/transformers_trainer.py (1)

387-387: Placement note: _patch_fsdp2_post_backward() is called unconditionally.

This patches the class-level method regardless of whether FSDP2 is actually in use. The ImportError guard and double-patch sentinel make this safe, but calling it only when self.accelerator.is_fsdp2 would be more precise. Not a blocker since the guards handle it correctly.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/_test_utils/import_helper.py (1)
100-102: ⚠️ Potential issue | 🟡 Minor

Pre-existing bug: skip message says "less than" but condition checks "greater than".

Line 100 skips when installed_version > required_version, but the message on line 102 says the version "is less than required". This is a pre-existing issue unrelated to this PR but worth noting.
Proposed fix
     if version.parse(installed_version) > version.parse(required_version):
         pytest.skip(
-            f"{package_name} version {installed_version} is less than required {required_version}"
+            f"{package_name} version {installed_version} is above the maximum {required_version}"
         )

🧹 Nitpick comments (5)

docs/source/getting_started/_installation_for_Linux.rst (1)

129-131: Consider mentioning [all] in addition to [onnx] for the cupy swap instruction.

The cupy swap applies regardless of whether the user installed nvidia-modelopt[onnx] or nvidia-modelopt[all]. Mentioning only [onnx] may cause users who installed [all] to miss this step.

.github/workflows/example_tests.yml (1)

68-73: The explicit docker_image: "26.01" for speculative_decoding is currently redundant with the default.

Since the fallback || '26.01' already covers it, the explicit value is unnecessary. However, keeping it is fine if the intent is to make it easy to override per-example in the future. The comment on line 59 hints at this being intentional.

tox.ini (2)

62-73: Consider pinning the fast-hadamard-transform dependency to a specific commit or tag.

git+https://github.com/Dao-AILab/fast-hadamard-transform.git installs from the default branch head, meaning CI can break at any time due to upstream changes. Pinning to a known-good commit (e.g., ...transform.git@<commit-sha>) would make GPU test runs reproducible.

The cupy-cuda12x → cupy-cuda13x swap on lines 69–70 is consistent with the CUDA 13 migration.

75-83: Same unpinned-git concern applies to mamba; otherwise LGTM.

Line 79 installs mamba from the default branch of state-spaces/mamba without a commit pin — same reproducibility note as fast-hadamard-transform above. The rest of the Megatron GPU environment looks correct.
modelopt/torch/quantization/plugins/transformers_trainer.py (1)
104-147: Well-documented and correct implementation — optional refinements available.

The idempotency guard, import-safety, and clear documentation of this as a temporary workaround are sound.

Two optional observations:

grad_dtype is cleared but never restored. After _modelopt_original_post_backward() runs, grad_dtype remains None. While it's re-set to None on every subsequent post_backward call, the codebase elsewhere uses save-restore patterns for similar monkey-patching (see utils.py lines 563–575). If future FSDP2 code paths inspect grad_dtype between backward passes, this could cause unexpected behavior. Consider saving and restoring it:
♻️ Optional: restore grad_dtype after the original call
     `@torch.no_grad`()
     def _patched_post_backward(self):
         # Allow bf16 gradients to be assigned to fp32 parameters
+        saved_grad_dtypes = {}
         for fsdp_param in self.fsdp_params:
             with contextlib.suppress(AttributeError):
+                saved_grad_dtypes[fsdp_param] = fsdp_param.sharded_param.grad_dtype
                 fsdp_param.sharded_param.grad_dtype = None

         self._modelopt_original_post_backward()

         # Cast gradients to parameter dtype so the optimizer sees matching dtypes
         for fsdp_param in self.fsdp_params:
             sp = fsdp_param.sharded_param
             if sp.grad is not None and sp.grad.dtype != sp.dtype:
                 sp.grad = sp.grad.to(sp.dtype)
+            if fsdp_param in saved_grad_dtypes:
+                sp.grad_dtype = saved_grad_dtypes[fsdp_param]
Exception safety. If _modelopt_original_post_backward() raises, the gradient casting loop (lines 142–145) is skipped. A try/finally would ensure cleanup, though the risk is low for a temporary workaround.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 requested a review from a team as a code owner February 13, 2026 23:27

coderabbitai bot reviewed Feb 13, 2026

View reviewed changes

tox.ini Outdated Show resolved Hide resolved

Upgrade Dev containers for CICD to latest

f0bd99c

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 force-pushed the kmorabia/upgrade-dev-setup branch from b7e3883 to f0bd99c Compare February 13, 2026 23:31

kevalmorabia97 requested review from AAnoosheh, ChenhanYu, cjluo-nv and realAsma February 14, 2026 00:47

kevalmorabia97 force-pushed the kmorabia/upgrade-dev-setup branch from 59eb8ff to 86e9821 Compare February 14, 2026 01:26

kevalmorabia97 requested a review from a team as a code owner February 14, 2026 01:26

kevalmorabia97 requested review from ajrasane and gcunhase February 14, 2026 01:26

kevalmorabia97 commented Feb 14, 2026

View reviewed changes

Fix failures

dea2eeb

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 force-pushed the kmorabia/upgrade-dev-setup branch from 86e9821 to dea2eeb Compare February 14, 2026 01:31

NVIDIA deleted a comment from coderabbitai bot Feb 14, 2026

coderabbitai bot reviewed Feb 14, 2026

View reviewed changes

Fix test_nvfp4_onnx_export.py

4f5a65f

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

realAsma approved these changes Feb 18, 2026

View reviewed changes

Conversation

kevalmorabia97 commented Feb 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kevalmorabia97 Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

realAsma Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 commented Feb 14, 2026

Uh oh!

coderabbitai bot commented Feb 14, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

kevalmorabia97 commented Feb 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 13, 2026 •

edited

Loading

codecov bot commented Feb 13, 2026 •

edited

Loading