transformers v5 support #1167

jlarson4 · 2026-02-09T19:33:16Z

Replaces #1164 (was merged as a standard merge instead of squash). Original PR: #1164

Original author: @speediedan

…xts. We also update .gitignore to exclude .env (commonly used local file exclution), e.g. to allow collaborators to add their on HF_TOKEN for test suite Core Fixes: ----------- transformer_lens/components/abstract_attention.py: - Replace pattern.to(self.cfg.dtype) with pattern.to(v.dtype) to handle cases where tensors are upcast to float32 for numerical stability while cfg.dtype remains float16/bfloat16 - Add explicit device/dtype synchronization for output projection: * Move weights (W_O) and bias (b_O) to match input device (z.device) * Ensure z matches weight dtype before final linear operation transformer_lens/model_bridge/bridge.py: - Replace direct original_model.to() call with move_to_and_update_config() utility to ensure: * All bridge components (not just original_model) are moved to target device * cfg.device and cfg.dtype stay synchronized with actual model state * Multi-GPU cache tensors remain on correct devices Test Fixes: ----------- tests/acceptance/test_hooked_encoder.py: - Fix test_cuda() to use correct fixture name 'tokens' instead of 'mlm_tokens' tests/acceptance/test_multi_gpu.py: - Update test_cache_device() to pass torch.device("cpu") instead of string "cpu" for proper device type validation tests/unit/components/test_attention.py: - Add test_attention_forward_half_precisions() to validate attention works correctly with bfloat16/float16 dtypes on CUDA devices tests/unit/factored_matrix/test_multiply_by_scalar.py: - Add test IDs to parametrize decorators to avoid pytest cache issues when random numbers appear in test names Tests Fixed by This Commit: --------------------------- - tests/acceptance/test_multi_gpu.py::test_cache_device - tests/acceptance/model_bridge/compatibility/test_legacy_hooked_transformer_coverage.py::TestLegacyHookedTransformerCoverage::test_memory_efficiency[gpt2] - tests/acceptance/model_bridge/compatibility/test_legacy_hooked_transformer_coverage.py::TestLegacyHookedTransformerCoverage::test_consistent_outputs[gpt2] - tests/acceptance/test_hooked_transformer.py::test_half_precision[dtype0] - tests/acceptance/test_hooked_transformer.py::test_half_precision[dtype1] - tests/unit/components/test_attention.py::test_attention_forward_half_precisions[dtype0] - tests/unit/components/test_attention.py::test_attention_forward_half_precisions[dtype1] - tests/unit/model_bridge/compatibility/test_utils.py::TestUtilsWithTransformerBridge::test_device_compatibility[gpt2]

Enhance to() method to properly handle both device and dtype arguments in all supported PyTorch formats (positional, keyword, combined). Separately invoke move_to_and_update_config for device/dtype to update cfg while delegating the actual tensor movement to original_model.to() with original args/kwargs. This ensures TransformerBridge respects standard PyTorch behavior for model.to() calls.

…port

Compatibility for transformers v5 and huggingface_hub v1.3.4 while maintaining backward compatibility with v4. **Handle API/Behavioral Changes:** - Handle batch_decode behavior change (wraps tokens for v4/v5 compatibility) - Add rotary_pct → rope_parameters['partial_rotary_factor'] migration helper - Fix BOS token handling for tokenizers without BOS (e.g., T5) - Update MoE router_scores shape expectations for compact top-k format - Add type casts for tokenizer.decode() return values **Code Changes:** - Add get_rotary_pct_from_config() utility for config v4/v5 compatibility - Wrap tokens for batch_decode in HookedTransformer, bridge, and notebooks - Add cast(str, ...) for decode() calls in generate() methods - Update test expectations for new router_scores shape - Add BOS token checks before setting add_bos_token=True **Infrastructure:** - Add pytest-rerunfailures dependency for flaky network tests (can be removed later once hub-related httpx read timeout issues are resolved) - Update dependencies: transformers 5.0.0, huggingface_hub 1.3.4 - Change HF cache to use HF_HUB_CACHE (TRANSFORMERS_CACHE removed in v5) - Update doctest to use range checks for numerical stability

…e httpx hub read timeouts affect both local and CI testing

…ould have rerun args

only as of TransformerLensOrg/TransformerLens#1167 is hf transformers >5.0.0 supported, but that is not in the latest beta 3.0.0 release. so, we use the dev version this was originally to fix an HF token issue but that issue remains. still working on it

speediedan and others added 18 commits November 16, 2025 12:00

Merge branch 'dev-3.x-folding' into device-dtype-sync-fixes

0a31256

minor formatting and type fix

b4660f8

rerun isort fix

ab280f1

minor sync enhancement

9d4e643

Merge branch 'device-dtype-sync-fixes' into basic-transformers-v5-sup…

4644e2c

…port

fix stale notebook cell state, use pytest rerun args in makefile sinc…

7ae76b5

…e httpx hub read timeouts affect both local and CI testing

notebooks may suffer from the httpx read timeout issue as well and sh…

538c451

…ould have rerun args

increase model download timeout at a workflow level

1aae93a

rerun args for all pytest commands in makefile

e0ca786

Merge branch 'dev-3.x' into basic-transformers-v5-support

94bbd88

Updating notebooks for 3.12 compatibility

44464b4

fix execution error

57b0fad

Remove timeout

b4107af

updated to allow for huggingface-cli as part of the uv lock file

66ecb53

adjust huggingface login

233537f

jlarson4 merged commit f970ba2 into dev-3.x Feb 9, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transformers v5 support #1167

transformers v5 support #1167

Uh oh!

jlarson4 commented Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

transformers v5 support #1167

transformers v5 support #1167

Uh oh!

Conversation

jlarson4 commented Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants