Skip to content

Conversation

@jlarson4
Copy link
Collaborator

@jlarson4 jlarson4 commented Feb 9, 2026

Replaces #1164 (was merged as a standard merge instead of squash). Original PR: #1164

Original author: @speediedan

speediedan and others added 18 commits November 16, 2025 12:00
…xts.

We also update .gitignore to exclude .env (commonly used local file exclution), e.g. to allow collaborators to add their on HF_TOKEN for test suite

Core Fixes:
-----------

transformer_lens/components/abstract_attention.py:
  - Replace pattern.to(self.cfg.dtype) with pattern.to(v.dtype) to handle cases
    where tensors are upcast to float32 for numerical stability while cfg.dtype
    remains float16/bfloat16
  - Add explicit device/dtype synchronization for output projection:
    * Move weights (W_O) and bias (b_O) to match input device (z.device)
    * Ensure z matches weight dtype before final linear operation

transformer_lens/model_bridge/bridge.py:
  - Replace direct original_model.to() call with move_to_and_update_config()
    utility to ensure:
    * All bridge components (not just original_model) are moved to target device
    * cfg.device and cfg.dtype stay synchronized with actual model state
    * Multi-GPU cache tensors remain on correct devices

Test Fixes:
-----------

tests/acceptance/test_hooked_encoder.py:
  - Fix test_cuda() to use correct fixture name 'tokens' instead of 'mlm_tokens'

tests/acceptance/test_multi_gpu.py:
  - Update test_cache_device() to pass torch.device("cpu") instead of string
    "cpu" for proper device type validation

tests/unit/components/test_attention.py:
  - Add test_attention_forward_half_precisions() to validate attention works
    correctly with bfloat16/float16 dtypes on CUDA devices

tests/unit/factored_matrix/test_multiply_by_scalar.py:
  - Add test IDs to parametrize decorators to avoid pytest cache issues when
    random numbers appear in test names

Tests Fixed by This Commit:
---------------------------
- tests/acceptance/test_multi_gpu.py::test_cache_device
- tests/acceptance/model_bridge/compatibility/test_legacy_hooked_transformer_coverage.py::TestLegacyHookedTransformerCoverage::test_memory_efficiency[gpt2]
- tests/acceptance/model_bridge/compatibility/test_legacy_hooked_transformer_coverage.py::TestLegacyHookedTransformerCoverage::test_consistent_outputs[gpt2]
- tests/acceptance/test_hooked_transformer.py::test_half_precision[dtype0]
- tests/acceptance/test_hooked_transformer.py::test_half_precision[dtype1]
- tests/unit/components/test_attention.py::test_attention_forward_half_precisions[dtype0]
- tests/unit/components/test_attention.py::test_attention_forward_half_precisions[dtype1]
- tests/unit/model_bridge/compatibility/test_utils.py::TestUtilsWithTransformerBridge::test_device_compatibility[gpt2]
Enhance to() method to properly handle both device and dtype arguments in
all supported PyTorch formats (positional, keyword, combined). Separately
invoke move_to_and_update_config for device/dtype to update cfg while
delegating the actual tensor movement to original_model.to() with original
args/kwargs. This ensures TransformerBridge respects standard PyTorch
behavior for model.to() calls.
Compatibility for transformers v5 and huggingface_hub v1.3.4
while maintaining backward compatibility with v4.

**Handle API/Behavioral Changes:**
- Handle batch_decode behavior change (wraps tokens for v4/v5 compatibility)
- Add rotary_pct → rope_parameters['partial_rotary_factor'] migration helper
- Fix BOS token handling for tokenizers without BOS (e.g., T5)
- Update MoE router_scores shape expectations for compact top-k format
- Add type casts for tokenizer.decode() return values

**Code Changes:**
- Add get_rotary_pct_from_config() utility for config v4/v5 compatibility
- Wrap tokens for batch_decode in HookedTransformer, bridge, and notebooks
- Add cast(str, ...) for decode() calls in generate() methods
- Update test expectations for new router_scores shape
- Add BOS token checks before setting add_bos_token=True

**Infrastructure:**
- Add pytest-rerunfailures dependency for flaky network tests (can be removed later once hub-related httpx read timeout issues are resolved)
- Update dependencies: transformers 5.0.0, huggingface_hub 1.3.4
- Change HF cache to use HF_HUB_CACHE (TRANSFORMERS_CACHE removed in v5)
- Update doctest to use range checks for numerical stability
…e httpx hub read timeouts affect both local and CI testing
@jlarson4 jlarson4 merged commit f970ba2 into dev-3.x Feb 9, 2026
15 checks passed
mivanit added a commit to mivanit/attention-motifs that referenced this pull request Feb 10, 2026
only as of TransformerLensOrg/TransformerLens#1167 is hf transformers >5.0.0 supported,
but that is not in the latest beta 3.0.0 release. so, we use the dev version

this was originally to fix an HF token issue but that issue remains. still working on it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants