Skip to content

Qwen3.5 text-only TransformerBridge support#1313

Merged
jlarson4 merged 10 commits into
TransformerLensOrg:devfrom
SamuelePunzo:qwen35-text-only-transformerbridge
May 19, 2026
Merged

Qwen3.5 text-only TransformerBridge support#1313
jlarson4 merged 10 commits into
TransformerLensOrg:devfrom
SamuelePunzo:qwen35-text-only-transformerbridge

Conversation

@SamuelePunzo
Copy link
Copy Markdown

Description

Adds production-ready, text-only Qwen3.5 support through
TransformerBridge.boot_transformers(...).

The supported public path is:

from transformer_lens.model_bridge import TransformerBridge

bridge = TransformerBridge.boot_transformers("Qwen/Qwen3.5-0.8B")

This PR intentionally keeps the scope narrow:

  • Supports dense text-only Qwen3.5 models via Qwen3_5ForCausalLM.
  • Routes HF configs with model_type="qwen3_5" and
    model_type="qwen3_5_text" to the text-only adapter.
  • Keeps base transformers>=4.56, while adding an optional
    qwen35 = ["transformers>=5.2.0"] install extra for actual Qwen3.5 use.
  • Adds clear optional dependency errors when Transformers is too old or does
    not expose Qwen3_5ForCausalLM.
  • Preserves text-only routing for top-level multimodal configs by swapping
    config.text_config into the HF causal LM loading path.
  • Rejects preloaded full Qwen3_5ForConditionalGeneration / multimodal models
    with a clear text-only error.
  • Keeps Qwen3.5 MoE and legacy HookedTransformer.from_pretrained out of
    scope.
  • Documents Qwen3.5 hooks under blocks.N.attn.* for full-attention layers
    and blocks.N.linear_attn.* for GatedDeltaNet linear-attention layers.

No linked issue.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Screenshots

Not applicable.

Testing

Run in LLMbenchmark-env with the repo's real pytest config:

conda run -n LLMbenchmark-env python -m pytest tests\unit\test_qwen3_5_adapter.py -q

Result:

64 passed

Qwen adjacent regression coverage:

conda run -n LLMbenchmark-env python -m pytest tests\unit\test_qwen3_5_adapter.py tests\unit\test_qwen3_next_adapter.py tests\unit\model_bridge\test_qwen3_moe_adapter.py tests\integration\model_bridge\test_qwen3_moe_bridge.py -q

Result:

135 passed, 2 skipped

Syntax check:

conda run -n LLMbenchmark-env python -m compileall tests\unit\test_qwen3_5_adapter.py transformer_lens\model_bridge\sources\transformers.py transformer_lens\model_bridge\supported_architectures\qwen3_5.py

Full local test-suite status:

  • Full collection sees the local suite, but some integration modules download
    HF assets at import time and fail in this environment because socket access
    is blocked.
  • tests\unit was run broadly. After the Qwen3.5 test-harness fix, the result
    was 1635 passed, 30 skipped, 10 xfailed, 10 failed, 68 errors.
  • The remaining failures/errors are outside this Qwen3.5 surface and are
    dominated by missing network/cache access for HF models such as distilgpt2,
    facebook/hubert-base-ls960, bert-base-cased, facebook/opt-125m,
    openai/gpt-oss-20b, and solu-*, plus a few pre-existing unrelated
    environment/version failures.

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
    • Focused tests emit existing hybrid-layer hook alias warnings for optional
      attention submodules on linear-attention layers.
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
    • Focused Qwen3.5 and adjacent Qwen tests pass. The broad local unit suite has
      unrelated environment/network failures as noted above.
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

brendanlong and others added 5 commits April 20, 2026 14:50
* Fix type of HookedTransformerConfig.device

This is typed as `Optional[str]` but sometimes returns `torch.device`.
Updated the code to just return the `str` instead of wrapping with a
device.

I'm not confident that every function which takes a device will
always be passed a string, so I didn't change functions like
warn_if_mps.

Found while working on TransformerLensOrg#1219

* more cleanup

* 3.0 CI Bugs (TransformerLensOrg#1261)

* Fixing `utils` imports

* skip gated notebooks on PR from forks

* Updating notebooks

* Ensure LLaMA only runs when HF_TOKEN is available

---------

Co-authored-by: jlarson4 <jonahalarson@comcast.net>
- Document Qwen3.5 text-only model usage in special_cases.md
- Update pyproject.toml to include transformers dependency for Qwen3.5
- Enhance unit tests for Qwen3.5 architecture detection and dependency handling
- Modify transformers.py to use prepared model config
- Implement stricter validation in Qwen3_5ArchitectureAdapter for model compatibility
@SamuelePunzo SamuelePunzo changed the title Qwen35 text only transformerbridge Qwen3.5 text-only TransformerBridge support May 18, 2026
Copy link
Copy Markdown
Collaborator

@jlarson4 jlarson4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent improvement on the existing Qwen3.5 support, just one small note about package management.

Comment thread transformer_lens/model_bridge/supported_architectures/qwen3_5.py Outdated
@jlarson4
Copy link
Copy Markdown
Collaborator

@SamuelePunzo Just a heads up, due to some file reorganization the merge conflict on this is pretty complex. I am going to resolve it and push the fix to your branch, make sure you pull it down if you do any additional work on the tests themselves.

@jlarson4
Copy link
Copy Markdown
Collaborator

Hi @SamuelePunzo! The conflict has been resolved and CI is running now. Please pull the latest changes before continuing your work.

Two additional notes:

  1. There were additional Qwen 3.5 tests added between when you started this PR, please take a look at those and factor them into your changes
  2. Please run make check-format and uv run mypy . & resolve any issues you encounter there to ensure your code can pass CI, those two checks are currently failing.

Thanks again for your hardworking on this, I look forward to getting it incorporated into the next TransformerLens release!

@SamuelePunzo
Copy link
Copy Markdown
Author

Thanks again for the help here! I pulled the latest branch and added packaging>=23.0 to the qwen35 extra as requested.

I also chased down the remaining CI failures after the conflict resolution. The format and mypy checks are passing locally now, and I added a small compatibility fix for the GPT-2 bridge benchmark/generation checks that were failing in CI.

Checks I ran locally:

  • make check-format
  • uv run mypy .
  • uv lock --check
  • Qwen3.5 adapter tests: 74 passed
  • GPT-2 benchmark check
  • generation compatibility tests

GitHub CI is oknow too.

@jlarson4
Copy link
Copy Markdown
Collaborator

Excellent work here! Merging and will include in the next release. Thank you for your contribution @SamuelePunzo

@jlarson4 jlarson4 merged commit e359d7c into TransformerLensOrg:dev May 19, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants