Qwen3.5 text-only TransformerBridge support#1313
Conversation
* Fix type of HookedTransformerConfig.device This is typed as `Optional[str]` but sometimes returns `torch.device`. Updated the code to just return the `str` instead of wrapping with a device. I'm not confident that every function which takes a device will always be passed a string, so I didn't change functions like warn_if_mps. Found while working on TransformerLensOrg#1219 * more cleanup * 3.0 CI Bugs (TransformerLensOrg#1261) * Fixing `utils` imports * skip gated notebooks on PR from forks * Updating notebooks * Ensure LLaMA only runs when HF_TOKEN is available --------- Co-authored-by: jlarson4 <jonahalarson@comcast.net>
TransformerLens 3.1.0
- Document Qwen3.5 text-only model usage in special_cases.md - Update pyproject.toml to include transformers dependency for Qwen3.5 - Enhance unit tests for Qwen3.5 architecture detection and dependency handling - Modify transformers.py to use prepared model config - Implement stricter validation in Qwen3_5ArchitectureAdapter for model compatibility
jlarson4
left a comment
There was a problem hiding this comment.
This is an excellent improvement on the existing Qwen3.5 support, just one small note about package management.
|
@SamuelePunzo Just a heads up, due to some file reorganization the merge conflict on this is pretty complex. I am going to resolve it and push the fix to your branch, make sure you pull it down if you do any additional work on the tests themselves. |
|
Hi @SamuelePunzo! The conflict has been resolved and CI is running now. Please pull the latest changes before continuing your work. Two additional notes:
Thanks again for your hardworking on this, I look forward to getting it incorporated into the next TransformerLens release! |
|
Thanks again for the help here! I pulled the latest branch and added I also chased down the remaining CI failures after the conflict resolution. The format and mypy checks are passing locally now, and I added a small compatibility fix for the GPT-2 bridge benchmark/generation checks that were failing in CI. Checks I ran locally:
GitHub CI is oknow too. |
|
Excellent work here! Merging and will include in the next release. Thank you for your contribution @SamuelePunzo |
Description
Adds production-ready, text-only Qwen3.5 support through
TransformerBridge.boot_transformers(...).The supported public path is:
This PR intentionally keeps the scope narrow:
Qwen3_5ForCausalLM.model_type="qwen3_5"andmodel_type="qwen3_5_text"to the text-only adapter.transformers>=4.56, while adding an optionalqwen35 = ["transformers>=5.2.0"]install extra for actual Qwen3.5 use.not expose
Qwen3_5ForCausalLM.config.text_configinto the HF causal LM loading path.Qwen3_5ForConditionalGeneration/ multimodal modelswith a clear text-only error.
HookedTransformer.from_pretrainedout ofscope.
blocks.N.attn.*for full-attention layersand
blocks.N.linear_attn.*for GatedDeltaNet linear-attention layers.No linked issue.
Type of change
Screenshots
Not applicable.
Testing
Run in
LLMbenchmark-envwith the repo's real pytest config:Result:
Qwen adjacent regression coverage:
Result:
Syntax check:
Full local test-suite status:
HF assets at import time and fail in this environment because socket access
is blocked.
tests\unitwas run broadly. After the Qwen3.5 test-harness fix, the resultwas
1635 passed, 30 skipped, 10 xfailed, 10 failed, 68 errors.dominated by missing network/cache access for HF models such as
distilgpt2,facebook/hubert-base-ls960,bert-base-cased,facebook/opt-125m,openai/gpt-oss-20b, andsolu-*, plus a few pre-existing unrelatedenvironment/version failures.
Checklist
attention submodules on linear-attention layers.
unrelated environment/network failures as noted above.