Skip to content

Fix: restore requires_grad in transformers5 reloading#907

Merged
h-guo18 merged 3 commits intomainfrom
haoguo/fix-restore
Feb 19, 2026
Merged

Fix: restore requires_grad in transformers5 reloading#907
h-guo18 merged 3 commits intomainfrom
haoguo/fix-restore

Conversation

@h-guo18
Copy link
Contributor

@h-guo18 h-guo18 commented Feb 18, 2026

What does this PR do?

Type of change: ?

Overview:

Patch transformers 5.x parameter loading to preserve original requires_grad settings.

In transformers v5.x, loading a checkpoint forcibly sets parameters' requires_grad,
which unintentionally unfreeze frozen parameters (e.g. Base model in eagle training).

This leads to optimizer initialization error since the restored optimizer expected more parameter than the checkpoint.

This monkey-patch restores the originalrequires_grad after loading parameters.

Reference:
https://github.com/huggingface/transformers/blob/v5.0.0.rc1-release/src/transformers/core_model_loading.py#L640

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

  • Bug Fixes
    • Fixed model parameter loading in speculative decoding to properly preserve gradient requirements for each parameter when using HuggingFace Transformers 5.x, ensuring correct behavior during checkpoint resumption and model initialization.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 18, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@h-guo18 h-guo18 changed the title fix: restore requires_grad in transforemrs5 reloading Fix: restore requires_grad in transforemrs5 reloading Feb 18, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 18, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

A new utility function that monkey-patches HuggingFace Transformers 5.x to preserve parameter requires_grad states during model loading has been added, and the speculative decoding example integrates this patch before loading models from checkpoints.

Changes

Cohort / File(s) Summary
Transformers Parameter Loading Patch
modelopt/torch/speculative/utils.py
Added patch_transformers5_params_loading function that monkey-patches core_model_loading.set_param_for_module to record and restore each parameter's original requires_grad state during HuggingFace Transformers 5.x model loading.
Speculative Decoding Integration
examples/speculative_decoding/main.py
Imported and invoked patch_transformers5_params_loading immediately after checkpoint detection, before calling load_vlm_or_llm_with_kwargs.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: fixing requires_grad restoration in transformers5 parameter reloading, which is the core purpose of this PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch haoguo/fix-restore

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@h-guo18 h-guo18 changed the title Fix: restore requires_grad in transforemrs5 reloading Fix: restore requires_grad in transformers5 reloading Feb 18, 2026
@h-guo18 h-guo18 marked this pull request as ready for review February 18, 2026 23:17
@h-guo18 h-guo18 requested a review from a team as a code owner February 18, 2026 23:17
@h-guo18 h-guo18 requested a review from yeyu-nvidia February 18, 2026 23:17
@codecov
Copy link

codecov bot commented Feb 18, 2026

Codecov Report

❌ Patch coverage is 10.52632% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.47%. Comparing base (3dd52bf) to head (6b5d205).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/speculative/utils.py 10.52% 17 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #907      +/-   ##
==========================================
- Coverage   73.52%   73.47%   -0.06%     
==========================================
  Files         205      205              
  Lines       22013    22032      +19     
==========================================
+ Hits        16185    16187       +2     
- Misses       5828     5845      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/speculative/utils.py`:
- Around line 490-525: The function patch_transformers5_params_loading
unconditionally reassigns core_model_loading.set_param_for_module, causing
double-wrapping on repeated calls; fix it by adding an idempotency guard: before
patching, check if core_model_loading.set_param_for_module already has a
sentinel attribute (e.g., _patched_by_modelopt) or if a module-level sentinel is
set, and if so return early; when creating patched_set_param_for_module, capture
the original only once (orig_set_param_for_module =
core_model_loading.set_param_for_module) and attach a sentinel flag
(setattr(patched_set_param_for_module, "_patched_by_modelopt", True)) to the new
function before assigning it back to core_model_loading.set_param_for_module so
subsequent calls detect the sentinel and avoid re-wrapping.
- Around line 510-523: In patched_set_param_for_module, guard against
AttributeError when the target attribute may be None by using
getattr(module_obj, param_name, None) to fetch the attribute into a local (e.g.,
attr) and only read attr.requires_grad if attr is not None and has the
attribute; store orig_requires_grad as None if attr is None. After calling
orig_set_param_for_module, fetch the attribute again (or reuse the local) and
only restore requires_grad when the attribute is not None (and
orig_requires_grad is not None) to avoid touching None (reference symbols:
patched_set_param_for_module, orig_set_param_for_module, module_obj,
param_name).

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
@h-guo18 h-guo18 enabled auto-merge (squash) February 19, 2026 00:14
@h-guo18 h-guo18 merged commit eb99488 into main Feb 19, 2026
37 checks passed
@h-guo18 h-guo18 deleted the haoguo/fix-restore branch February 19, 2026 01:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments