-
Notifications
You must be signed in to change notification settings - Fork 507
transformers v5 support #1167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
transformers v5 support #1167
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…xts.
We also update .gitignore to exclude .env (commonly used local file exclution), e.g. to allow collaborators to add their on HF_TOKEN for test suite
Core Fixes:
-----------
transformer_lens/components/abstract_attention.py:
- Replace pattern.to(self.cfg.dtype) with pattern.to(v.dtype) to handle cases
where tensors are upcast to float32 for numerical stability while cfg.dtype
remains float16/bfloat16
- Add explicit device/dtype synchronization for output projection:
* Move weights (W_O) and bias (b_O) to match input device (z.device)
* Ensure z matches weight dtype before final linear operation
transformer_lens/model_bridge/bridge.py:
- Replace direct original_model.to() call with move_to_and_update_config()
utility to ensure:
* All bridge components (not just original_model) are moved to target device
* cfg.device and cfg.dtype stay synchronized with actual model state
* Multi-GPU cache tensors remain on correct devices
Test Fixes:
-----------
tests/acceptance/test_hooked_encoder.py:
- Fix test_cuda() to use correct fixture name 'tokens' instead of 'mlm_tokens'
tests/acceptance/test_multi_gpu.py:
- Update test_cache_device() to pass torch.device("cpu") instead of string
"cpu" for proper device type validation
tests/unit/components/test_attention.py:
- Add test_attention_forward_half_precisions() to validate attention works
correctly with bfloat16/float16 dtypes on CUDA devices
tests/unit/factored_matrix/test_multiply_by_scalar.py:
- Add test IDs to parametrize decorators to avoid pytest cache issues when
random numbers appear in test names
Tests Fixed by This Commit:
---------------------------
- tests/acceptance/test_multi_gpu.py::test_cache_device
- tests/acceptance/model_bridge/compatibility/test_legacy_hooked_transformer_coverage.py::TestLegacyHookedTransformerCoverage::test_memory_efficiency[gpt2]
- tests/acceptance/model_bridge/compatibility/test_legacy_hooked_transformer_coverage.py::TestLegacyHookedTransformerCoverage::test_consistent_outputs[gpt2]
- tests/acceptance/test_hooked_transformer.py::test_half_precision[dtype0]
- tests/acceptance/test_hooked_transformer.py::test_half_precision[dtype1]
- tests/unit/components/test_attention.py::test_attention_forward_half_precisions[dtype0]
- tests/unit/components/test_attention.py::test_attention_forward_half_precisions[dtype1]
- tests/unit/model_bridge/compatibility/test_utils.py::TestUtilsWithTransformerBridge::test_device_compatibility[gpt2]
Enhance to() method to properly handle both device and dtype arguments in all supported PyTorch formats (positional, keyword, combined). Separately invoke move_to_and_update_config for device/dtype to update cfg while delegating the actual tensor movement to original_model.to() with original args/kwargs. This ensures TransformerBridge respects standard PyTorch behavior for model.to() calls.
Compatibility for transformers v5 and huggingface_hub v1.3.4 while maintaining backward compatibility with v4. **Handle API/Behavioral Changes:** - Handle batch_decode behavior change (wraps tokens for v4/v5 compatibility) - Add rotary_pct → rope_parameters['partial_rotary_factor'] migration helper - Fix BOS token handling for tokenizers without BOS (e.g., T5) - Update MoE router_scores shape expectations for compact top-k format - Add type casts for tokenizer.decode() return values **Code Changes:** - Add get_rotary_pct_from_config() utility for config v4/v5 compatibility - Wrap tokens for batch_decode in HookedTransformer, bridge, and notebooks - Add cast(str, ...) for decode() calls in generate() methods - Update test expectations for new router_scores shape - Add BOS token checks before setting add_bos_token=True **Infrastructure:** - Add pytest-rerunfailures dependency for flaky network tests (can be removed later once hub-related httpx read timeout issues are resolved) - Update dependencies: transformers 5.0.0, huggingface_hub 1.3.4 - Change HF cache to use HF_HUB_CACHE (TRANSFORMERS_CACHE removed in v5) - Update doctest to use range checks for numerical stability
…e httpx hub read timeouts affect both local and CI testing
…ould have rerun args
mivanit
added a commit
to mivanit/attention-motifs
that referenced
this pull request
Feb 10, 2026
only as of TransformerLensOrg/TransformerLens#1167 is hf transformers >5.0.0 supported, but that is not in the latest beta 3.0.0 release. so, we use the dev version this was originally to fix an HF token issue but that issue remains. still working on it
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Replaces #1164 (was merged as a standard merge instead of squash). Original PR: #1164
Original author: @speediedan