Fix: restore requires_grad in transformers5 reloading#907
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughA new utility function that monkey-patches HuggingFace Transformers 5.x to preserve parameter Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #907 +/- ##
==========================================
- Coverage 73.52% 73.47% -0.06%
==========================================
Files 205 205
Lines 22013 22032 +19
==========================================
+ Hits 16185 16187 +2
- Misses 5828 5845 +17 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/speculative/utils.py`:
- Around line 490-525: The function patch_transformers5_params_loading
unconditionally reassigns core_model_loading.set_param_for_module, causing
double-wrapping on repeated calls; fix it by adding an idempotency guard: before
patching, check if core_model_loading.set_param_for_module already has a
sentinel attribute (e.g., _patched_by_modelopt) or if a module-level sentinel is
set, and if so return early; when creating patched_set_param_for_module, capture
the original only once (orig_set_param_for_module =
core_model_loading.set_param_for_module) and attach a sentinel flag
(setattr(patched_set_param_for_module, "_patched_by_modelopt", True)) to the new
function before assigning it back to core_model_loading.set_param_for_module so
subsequent calls detect the sentinel and avoid re-wrapping.
- Around line 510-523: In patched_set_param_for_module, guard against
AttributeError when the target attribute may be None by using
getattr(module_obj, param_name, None) to fetch the attribute into a local (e.g.,
attr) and only read attr.requires_grad if attr is not None and has the
attribute; store orig_requires_grad as None if attr is None. After calling
orig_set_param_for_module, fetch the attribute again (or reuse the local) and
only restore requires_grad when the attribute is not None (and
orig_requires_grad is not None) to avoid touching None (reference symbols:
patched_set_param_for_module, orig_set_param_for_module, module_obj,
param_name).
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
ca53557 to
6b5d205
Compare
What does this PR do?
Type of change: ?
Overview:
Patch transformers 5.x parameter loading to preserve original
requires_gradsettings.In transformers v5.x, loading a checkpoint forcibly sets parameters' requires_grad,
which unintentionally unfreeze frozen parameters (e.g. Base model in eagle training).
This leads to optimizer initialization error since the restored optimizer expected more parameter than the checkpoint.
This monkey-patch restores the original
requires_gradafter loading parameters.Reference:
https://github.com/huggingface/transformers/blob/v5.0.0.rc1-release/src/transformers/core_model_loading.py#L640
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit