diff --git a/CI_COVERAGE_SPLIT_PLAN.md b/CI_COVERAGE_SPLIT_PLAN.md
new file mode 100644
index 000000000..34c37d7c6
--- /dev/null
+++ b/CI_COVERAGE_SPLIT_PLAN.md
@@ -0,0 +1,152 @@
+# Plan: Split Full Code Coverage Test into parallel shards
+
+## Context
+
+The `Full Code Coverage Test` job in [.github/workflows/checks.yml](.github/workflows/checks.yml) runs the full test suite (`tests/integration + tests/benchmarks + tests/unit + tests/acceptance`) with `--cov` instrumentation and produces the `test-coverage` artifact that the `docs-build` job depends on.
+
+Recent measurement (run 25564944516): **39.2 minutes**. This is the critical-path job for the workflow — every other job finishes in ≤18 min, so coverage gates the whole run. A 39-minute feedback loop is too long for active development.
+
+The fix is to matrix-shard the tests, write per-shard `.coverage.<id>` data files, and merge in a final job via `coverage combine`. Coverage.py supports this natively, including with `--cov-branch`.
+
+Target after this plan: **~20 minutes wall-clock**, no loss of coverage data.
+
+## Files to modify
+
+- `.github/workflows/checks.yml` — replace the single `coverage-test` job with a 3-shard matrix + a `coverage-combine` job. Update `needs:` on `docs-build`.
+
+That's it. No source or `makefile` changes needed; pytest invocations live in the workflow YAML.
+
+## Implementation
+
+### Step 1 — Replace `coverage-test` with a sharded matrix
+
+Three shards, sized roughly evenly based on observed runtime:
+
+| Shard id | Path | Estimated time | xdist? |
+|---|---|---|---|
+| `unit-acceptance` | `tests/unit tests/acceptance` | ~20 min | No (model loads will OOM) |
+| `integration` | `tests/integration` | ~13 min | No (same reason) |
+| `benchmarks` | `tests/benchmarks` | ~3 min | `-n 2` if helpful |
+
+Each matrix entry runs the same setup as the current job (uv sync, HF model cache restore, HF auth) and then:
+
+```yaml
+- name: Run shard
+  run: |
+    uv run pytest \
+      --cov=transformer_lens \
+      --cov-branch \
+      --cov-report= \
+      ${{ matrix.shard.path }}
+  env:
+    HF_TOKEN: ${{ secrets.HF_TOKEN }}
+    COVERAGE_FILE: .coverage.${{ matrix.shard.id }}
+
+- name: Upload partial coverage data
+  uses: actions/upload-artifact@v4
+  with:
+    name: coverage-data-${{ matrix.shard.id }}
+    path: .coverage.${{ matrix.shard.id }}
+    include-hidden-files: true   # required: .coverage.* is dotfile
+    retention-days: 1
+```
+
+Key details:
+- `COVERAGE_FILE=.coverage.<id>` per shard so each writes to a distinct file
+- `--cov-report=` (empty) suppresses per-shard HTML; only data file is needed
+- `include-hidden-files: true` is mandatory on `upload-artifact@v4+` for dotfiles
+- Keep `timeout-minutes: 30` (down from 60) since longest shard is ~20 min
+
+### Step 2 — Add `coverage-combine` job
+
+```yaml
+coverage-combine:
+  name: Combine coverage and build report
+  runs-on: ubuntu-latest
+  needs: coverage-test
+  timeout-minutes: 5
+  steps:
+    - uses: actions/checkout@v4
+    - name: Install uv
+      uses: astral-sh/setup-uv@v7
+      with:
+        python-version: "3.12"
+        activate-environment: true
+        enable-cache: true
+    - name: Install dependencies
+      run: |
+        uv lock --check
+        uv sync
+    - name: Download all partial coverage artifacts
+      uses: actions/download-artifact@v4
+      with:
+        pattern: coverage-data-*
+        merge-multiple: true
+    - name: Combine + build report
+      run: |
+        uv run coverage combine
+        uv run coverage html -d htmlcov
+        uv run coverage report --skip-empty
+    - name: Upload Coverage Report Artifact
+      uses: actions/upload-artifact@v4
+      with:
+        name: test-coverage   # preserve name; docs-build expects this
+        path: htmlcov
+```
+
+### Step 3 — Update `docs-build` dependency
+
+In the existing `docs-build` job (currently `needs: coverage-test`):
+
+```yaml
+docs-build:
+  needs: coverage-combine   # was: coverage-test
+```
+
+The downstream `download-artifact` call (line 392 of checks.yml) is unchanged — it still pulls the `test-coverage` artifact, just from `coverage-combine` instead of `coverage-test`.
+
+### Step 4 (optional, only if needed) — Sentinel job for branch protection
+
+If "Full Code Coverage Test" is currently configured as a required status check on `main` / `dev*`, the matrix split will rename it to multiple entries (e.g. `Full Code Coverage Test (unit-acceptance)`). Branch protection rules will need to either:
+
+- (a) Be updated to require all three matrix entries individually, or
+- (b) Require a new sentinel job:
+
+```yaml
+coverage-required:
+  name: Coverage (required)
+  runs-on: ubuntu-latest
+  needs: [coverage-test, coverage-combine]
+  if: always()
+  steps:
+    - run: |
+        # Fail if any upstream failed
+        if [[ "${{ needs.coverage-test.result }}" != "success" ]]; then exit 1; fi
+        if [[ "${{ needs.coverage-combine.result }}" != "success" ]]; then exit 1; fi
+        echo "All coverage shards passed and were combined."
+```
+
+Then point branch protection at `Coverage (required)` instead.
+
+## Verification
+
+Run on a test branch and check:
+
+1. **Wall-clock**: longest shard should be ~20 min (unit-acceptance). Combine job adds ~2 min. Total ≈ 22 min vs 39 min baseline.
+2. **Coverage parity**: compare `coverage report` output line-by-line against the pre-split run. Branch coverage numbers should match exactly — `coverage combine` with `--cov-branch` data is well-tested.
+3. **Artifact contract**: `docs-build` still finds and consumes `test-coverage`. No change visible to downstream consumers.
+4. **Empty shard handling**: if any shard has zero tests collected (shouldn't happen here), `coverage combine` ignores empty data files cleanly.
+
+## Risks and mitigations
+
+- **HF model cache cost**: each shard restores from the same cache key. After first warm-up, all shards hit the cache; first cold run pays the download once per shard. Acceptable cost.
+- **Coverage merging edge cases**: branch coverage data can drift between shards if a single line is exercised in two shards via different code paths. `coverage combine` handles this — verified in the tool's own test suite.
+- **pytest-xdist on shards that load models**: not applied. If we want xdist on the unit shard later, gate it on whether that shard's tests use big models.
+- **Test interaction across shards**: tests in `tests/integration` and `tests/unit` don't share state across processes (no shared fixtures across directories). Sharding is safe.
+
+## Follow-up work (out of scope for this plan)
+
+- **Split integration with `pytest-split`**: if 20 min is still too slow, split `tests/integration` into 2 balanced shards using `pytest-split`'s `--splits N --group i`. Drops wall-clock further to ~12-15 min.
+- **Split unit too**: `tests/unit` has 115 files; 2-way split via `pytest-split` would balance well. Combined with integration split, wall-clock ~10-12 min.
+- **Compat checks parallelization**: after coverage drops, "Compatibility Checks" at ~17 min becomes the new critical path. Separate question.
+- **Drop redundancy between coverage and Python 3.12 compat**: currently both run unit+acceptance on 3.12. Could be dedup'd, but complicates the matrix; not urgent.
diff --git a/OPEN_ISSUES_TRIAGE.md b/OPEN_ISSUES_TRIAGE.md
new file mode 100644
index 000000000..7c17ce021
--- /dev/null
+++ b/OPEN_ISSUES_TRIAGE.md
@@ -0,0 +1,351 @@
+# Open Issues Triage (v5)
+
+**Generated:** 2026-05-12 (v5 — refreshed after this branch's closures + a separate 2026-05-13 sprint)
+**Repo:** TransformerLensOrg/TransformerLens
+**Open issues:** 25 (24 surviving from v4 + 1 newly opened)
+**Previous archives:** [OPEN_ISSUES_TRIAGE.v4.md](OPEN_ISSUES_TRIAGE.v4.md), [OPEN_ISSUES_TRIAGE.v3.md](OPEN_ISSUES_TRIAGE.v3.md), [OPEN_ISSUES_TRIAGE_OLD.md](OPEN_ISSUES_TRIAGE_OLD.md) (v2)
+
+## What changed since v4
+
+- **14 issues closed**: #112, #210, #297, #341, #385, #453, #462, #483, #588, #615, #644, #720, #796, #830
+  - Closed in this branch (`issues/may-12-cleanup`): #210, #297, #341, #385, #453, #615, #644, #796
+  - Closed in the 2026-05-13 sprint (other branches): #112, #720, #830
+  - v4 already flagged closeable: #462, #483, #588
+- **1 new entry**: #1302 (additional architecture adapter tests, opened by jlarson4) — maps to `not-addressed-simple`
+- **24 entries re-verified** against current code; no verdict refinements needed (all v4 verdicts still hold for surviving issues)
+
+### Newly closeable based on v5 re-verification
+
+None — v4's predictions all landed; nothing new flagged as ready-to-close in this pass.
+
+### Verdict refinements (still open, context updated)
+
+- **#543** (Grokking demo broken in Colab) — root cause confirmed during this branch's session: `loss_fn(all_logits, labels)` uses the shape-rearranged `all_logits` (113×113×113) with the flat `labels` (12769), causing the documented gather mismatch. Fix is a one-token rename (`all_logits` → `original_logits`) plus checkpoint-CPU offload for the memory tail. Bridge migration is N/A for this demo (custom-config training from scratch — outside bridge's HF-wrapping design space). Bucket stays `bug-likely-fixed-needs-verification` pending the actual patch.
+
+The v2 methodology section (HT-side / Bridge-side / Replication / Next step) still applies — see [OPEN_ISSUES_TRIAGE_OLD.md](OPEN_ISSUES_TRIAGE_OLD.md#methodology-per-issue).
+
+## Summary table (sorted by issue number)
+
+| Issue | Title | Bucket |
+|---|---|---|
+| #111 | [Demo of direct path patching](#issue-111) | `not-addressed-difficult` |
+| #479 | [Memory efficient causal mask implementation](#issue-479) | `partial-leave-open` |
+| #481 | [Tracr to TransformerLens demo broken](#issue-481) | `bug-still-reproduces` |
+| #509 | [LayerNorm folding not implemented for BertBlock](#issue-509) | `not-addressed-difficult` |
+| #543 | [Grokking demo broken in Colab](#issue-543) | `bug-likely-fixed-needs-verification` |
+| #595 | [Add Stopping Criteria support](#issue-595) | `not-addressed-simple` |
+| #697 | [Activation cache during generate](#issue-697) | `not-addressed-simple` |
+| #704 | [Add support for TracrBench](#issue-704) | `not-relevant-close` |
+| #710 | [MVP Support For 1-2 Models Per-Modality](#issue-710) | `not-addressed-difficult` |
+| #737 | [Q reshape with model loaded in 4bit](#issue-737) | `partial-leave-open` |
+| #773 | [TransformerLens on models with different layernorm placement (BioGPT)](#issue-773) | `not-addressed-difficult` |
+| #798 | [Remove `model_args` (use only `model_kwargs`)](#issue-798) | `not-addressed-simple` |
+| #837 | [Multi-GPU device ordinal issue (`n_devices=3` for llama2-7b)](#issue-837) | `fixed-on-transformerbridge` |
+| #867 | [Does TransformerLens support LVLM like Qwen2-VL?](#issue-867) | `not-addressed-difficult` |
+| #869 | [Custom generative video transformer](#issue-869) | `not-addressed-difficult` |
+| #888 | [Adapt HookedTransformer to a non-supported model (CLIP language model)](#issue-888) | `not-addressed-difficult` |
+| #953 | [Add basic support for Gemma 3n (E2B & E4B)](#issue-953) | `not-addressed-difficult` |
+| #1080 | [Import fails by default in Colab (numpy ABI mismatch)](#issue-1080) | `bug-likely-fixed-needs-verification` |
+| #1148 | [Tutorial for "Real-Time Training Dynamics" (VSM Telemetry)](#issue-1148) | `not-addressed-simple` |
+| #1263 | [Direct Logit Attribution Tool](#issue-1263) | `not-addressed-simple` |
+| #1280 | [Add support for `cpu`, `meta`, and `disk` to TransformerBridge `device_map`](#issue-1280) | `partial-leave-open` |
+| #1291 | [CI HuggingFace Call Reduction](#issue-1291) | `partial-leave-open` |
+| #1297 | [Gemma4 Architecture Adapter](#issue-1297) | `not-addressed-simple` |
+| #1298 | [External Architecture Registration](#issue-1298) | `not-addressed-simple` |
+| #1302 | [Additional Architecture Adapter tests](#issue-1302) | `not-addressed-simple` |
+
+## Per-issue entries
+
+<a id="issue-111"></a>
+
+#### #111 — Demo of direct path patching
+
+- **Issue**: Add a section to Exploratory Analysis Demo demonstrating direct path patching for all head pairs. PR #49 was an early attempt.
+- **HookedTransformer**: still no first-class path-patching helper. Verified — no `path_patch`/`direct_path` symbols exist anywhere under [transformer_lens/](transformer_lens/) or [transformer_lens/utilities/](transformer_lens/utilities/). [demos/Activation_Patching_in_TL_Demo.ipynb](demos/Activation_Patching_in_TL_Demo.ipynb) and [demos/Attribution_Patching_Demo.ipynb](demos/Attribution_Patching_Demo.ipynb) are the closest.
+- **TransformerBridge**: same — no path-patching primitive in either API; bridge reuses the same `ActivationCache`.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: callum mcdougall pointed users at the [ARENA IOI notebook](https://colab.research.google.com/drive/1KgrEwvCKdX-8DQ1uSiIuxwIiwzJuQ3Gw). Either close with a docs pointer to ARENA, or implement a TL helper that wraps the pattern (~80 LoC).
+
+<a id="issue-479"></a>
+
+#### #479 — Memory efficient causal mask implementation
+
+- **Issue**: Each `Attention` layer registers a `(n_ctx, n_ctx)` boolean `causal_mask` buffer. ~86 GB overhead at Qwen 72B × 32K ctx.
+- **HookedTransformer**: confirmed at [transformer_lens/components/abstract_attention.py:120-128](transformer_lens/components/abstract_attention.py#L120-L128) — `causal_mask = torch.tril(torch.ones((self.cfg.n_ctx, self.cfg.n_ctx)).bool())` and `register_buffer("mask", causal_mask)` still present (also at line 774 for resize). Bug as reported still present for ALL HT architectures.
+- **TransformerBridge**: architecture-dependent. GPT2-family inherits HF's static `(max_pos, max_pos)` buffer. Modern HF impls (GPTNeoX/Pythia/Llama/Qwen/Mistral/Gemma) use `_update_causal_mask` per forward — zero overhead. The motivating Qwen 72B case is fixed on bridge.
+- **Replication**: `[empirically replicated]` per v2.
+- **Bucket**: `partial-leave-open`
+- **Next step**: bridge users on modern architectures already have the desired memory profile. HT-side fix (~30 LoC: replace pre-allocated buffer with on-the-fly construction in `apply_causal_mask`) closes it for the legacy path and GPT2-family use cases.
+
+<a id="issue-481"></a>
+
+#### #481 — Tracr to TransformerLens demo broken
+
+- **Issue**: Demo notebook assumes "the unembed is a projection onto the first few elements of the residual stream" — wrong because Tracr re-orders the residual stream alphabetically. Needs Tracr upstream PR to expose the unembed matrix.
+- **HookedTransformer**: 🐛 confirmed at [demos/Tracr_to_Transformer_Lens_Demo.ipynb:233](demos/Tracr_to_Transformer_Lens_Demo.ipynb) — `sd["unembed.W_U"] = np.eye(d_model, d_vocab_out)` line still present. No commits on the notebook since v3.
+- **TransformerBridge**: ❌ N/A — Tracr-specific issue applies regardless of API; root cause is in the unembed-matrix derivation, not in TL's hook system. Demo not ported to bridge.
+- **Replication**: `[code-verified]`
+- **Bucket**: `bug-still-reproduces`
+- **Next step**: needs Tracr upstream PR to expose `unembed_matrix` in `tracr.params`. FlyingPumba previously volunteered. Without that, demo is fundamentally limited.
+
+<a id="issue-509"></a>
+
+#### #509 — LayerNorm folding not implemented for BertBlock
+
+- **Issue**: BertBlock uses post-norm; `fold_ln=True` would fold LN into Q/K/V which is mathematically incorrect for post-norm.
+- **HookedTransformer**: 🐛 architectural limitation per Neel ("LayerNorm should not be folded at all... I can't think of any way to do LayerNorm folding for Bert"). [`HookedEncoder.from_pretrained`](transformer_lens/HookedEncoder.py#L412) hardcodes `fold_ln=False`. `BertBlock` at [transformer_lens/components/bert_block.py:19](transformer_lens/components/bert_block.py#L19). No changes since v3.
+- **TransformerBridge**: ⚠️ `BertArchitectureAdapter` exists; `enable_compatibility_mode()` would inherit the same fold-doesn't-work problem. Bridge users typically don't fold LN regardless.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: Two options unchanged from v3 — close as wontfix (Neel's view) or add a 5-line warning when `fold_ln=True` is passed for a BERT-family architecture.
+
+<a id="issue-543"></a>
+
+#### #543 — Grokking demo broken in Colab
+
+- **Issue**: `loss_fn(all_logits, labels)` raises `RuntimeError: Size does not match at dimension 0 expected index [12769, 1] to be smaller than self [113, 113]`.
+- **HookedTransformer**: ⚠️ unverified. `demos/Grokking_Demo.ipynb` last touched in `98811df5 3.0 CI Bugs (#1261)`; no commits referencing #543. No new activity since v3.
+- **TransformerBridge**: N/A — demo-specific shape bug.
+- **Replication**: `[unverifiable]` — needs Colab-like environment to run the full notebook end-to-end.
+- **Bucket**: `bug-likely-fixed-needs-verification`
+- **Next step**: ask reporter (or anthonyduong9) to re-run the notebook on current `dev` and confirm whether the original error reproduces. If yes, fix is in `loss_fn` shape mismatch; if no, close.
+
+<a id="issue-595"></a>
+
+#### #595 — Add Stopping Criteria support
+
+- **Issue**: HF offers `StoppingCriteria` for custom halt conditions; HT/bridge `generate()` only support `stop_at_eos`.
+- **HookedTransformer**: ❌ unchanged — [transformer_lens/HookedTransformer.py:1882](transformer_lens/HookedTransformer.py#L1882) `generate()` and `generate_stream()` (line 2262) still only take `stop_at_eos: bool`.
+- **TransformerBridge**: ❌ unchanged — [transformer_lens/model_bridge/bridge.py:2438](transformer_lens/model_bridge/bridge.py#L2438) `generate()` and `generate_stream()` (line 2754) only have `stop_at_eos`.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~30 LoC — add `stopping_criteria: Optional[Callable[[tokens, logits], bool]] = None` to all four entry points; evaluate after each sampled token and break if any returns True. srishti-git1110 volunteered in 2024.
+
+<a id="issue-697"></a>
+
+#### #697 — Activation cache during generate
+
+- **Issue**: User wants `run_with_cache` semantics during `model.generate()` — cache activations of generated tokens, not just the prompt.
+- **HookedTransformer**: ❌ unchanged — [transformer_lens/HookedTransformer.py:1873](transformer_lens/HookedTransformer.py#L1873) `generate()` and `generate_stream()` (line 2257) still don't integrate `run_with_cache`. bryce's reply: "no integration ... pretty low priority."
+- **TransformerBridge**: ❌ unchanged — [transformer_lens/model_bridge/bridge.py:2434](transformer_lens/model_bridge/bridge.py#L2434) bridge `generate` and `generate_stream` (line 2749) — same gap. PR #1265 improved `run_with_cache`/`run_with_hooks` interaction but didn't add cache-during-generate.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~50 LoC enhancement — wrap the per-token forward in `run_with_cache`'s hook-installation context, accumulate cache across iterations. Trickier than naive due to KV-cache interactions; needs care to avoid duplicate hook fires when cache grows. Both APIs need the same fix.
+
+<a id="issue-704"></a>
+
+#### #704 — Add support for TracrBench
+
+- **Issue**: TracrBench (121 toy Tracr transformers) — should it ship in TransformerLens or live in a separate repo.
+- **HookedTransformer**: ❌ not in core. `grep -i tracr` in `transformer_lens/` returns nothing; only the Tracr→HookedTransformer demo lives in [docs/source/content/tutorials.md:39](docs/source/content/tutorials.md#L39).
+- **TransformerBridge**: ❌ not in core; not a transformer-architecture-detection problem.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: nothing material — no TracrBench code added, no new comments.
+- **Bucket**: `not-relevant-close`
+- **Next step**: close with Neel's recommendation: build TracrBench as an external repo using TransformerLens as a dependency. Optionally add a one-line link from `docs/source/content/gallery.md` (currently absent).
+
+<a id="issue-710"></a>
+
+#### #710 — MVP Support For 1-2 Models Per-Modality
+
+- **Issue**: Add basic non-text-model support — TTS (Whisper), vision (ResNet, ViT), music gen, etc.
+- **HookedTransformer**: ❌ not designed for non-text architectures.
+- **TransformerBridge**: ⚠️ partial — 56 adapters total at [transformer_lens/model_bridge/supported_architectures/](transformer_lens/model_bridge/supported_architectures); audio (`hubert.py`), VLM (`llava.py`, `llava_next.py`, `llava_onevision.py`, `gemma3_multimodal.py`), SSM (`mamba.py`, `mamba2.py`). Still no Whisper, no ViT, no ResNet, no diffusion.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: nothing material — same adapter set; multimodal text-gen fix landed (`58330ad0`) but no new modality.
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: per the existing comment thread, encourage reporters to file per-modality sub-issues (Whisper, ViT, etc.). Convert this to a tracking meta-issue or close once sub-issues filed.
+
+<a id="issue-737"></a>
+
+#### #737 — Q reshape with model loaded in 4bit
+
+- **Issue**: `cfg.use_split_qkv_input=True` + 4bit vicuna-7b → shape mismatch in `AbstractAttention.calculate_qkv_matrices` — 4bit BnB-packed weight reshapes incorrectly under split-QKV.
+- **HookedTransformer**: 🐛 still buggy — `if self.cfg.load_in_4bit:` branches confirmed at [abstract_attention.py:58,338,378,454,473,491](transformer_lens/components/abstract_attention.py#L338). No commits to abstract_attention.py since v3 targeting this path.
+- **TransformerBridge**: N/A — bridge has no `use_split_qkv_input` flag; quantized models load via `boot_transformers(hf_model=quantized_model)` and use HF's quantized Linear directly. Recent quantization work (`d346e707` "Improved quantization skipping") is bridge-side, doesn't touch this HT branch.
+- **Replication**: `[unverifiable]` — needs GPU + bitsandbytes 4bit.
+- **What changed since v3**: nothing material on this code path.
+- **Bucket**: `partial-leave-open`
+- **Next step**: HT-side fix needs reshape-aware logic in `calculate_qkv_matrices` for 4bit + split path (~30 LoC). Bridge users avoid this entirely. Reporter workaround on HT: disable `use_split_qkv_input` for 4bit models.
+
+<a id="issue-773"></a>
+
+#### #773 — TransformerLens on models with different layernorm placement (BioGPT)
+
+- **Issue**: BioGPT has only one LN per layer (post-MLP `final_layer_norm`), unlike GPT-2's pre-LN1+pre-LN2. User asks for support.
+- **HookedTransformer**: ❌ hard-coded GPT-2 LN placement; `BioGptForCausalLM` listed at [tools/model_registry/data/architecture_gaps.json:909](transformer_lens/tools/model_registry/data/architecture_gaps.json#L909).
+- **TransformerBridge**: ❌ no `BioGptArchitectureAdapter` — not in [transformer_lens/model_bridge/supported_architectures/](transformer_lens/model_bridge/supported_architectures) (56 adapters, none for BioGPT). The component-map pattern theoretically supports per-arch LN layout, but no adapter exists.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: nothing material; adapter creation guide at [docs/source/content/adapter_development/adapter-creation-guide.md](docs/source/content/adapter_development/adapter-creation-guide.md) is now a viable path for the reporter.
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: write a `BioGptArchitectureAdapter` (~80 LoC + tests) following the adapter-creation-guide. Reasonable to invite reporter to take this on with the guide.
+
+<a id="issue-798"></a>
+
+#### #798 — Remove `model_args` (use only `model_kwargs`)
+
+- **Issue**: Bryce's own proposal to remove `*model_args` + `**model_kwargs` redundancy in pass-through functions.
+- **HookedTransformer**: ⚠️ unchanged — `model_args` still present in [HookedEncoderDecoder.py:489-513](transformer_lens/HookedEncoderDecoder.py#L489), [hook_points.py:629,723,779](transformer_lens/hook_points.py#L629), [HookedAudioEncoder.py:299-323](transformer_lens/HookedAudioEncoder.py#L299), [BertNextSentencePrediction.py:220-266](transformer_lens/BertNextSentencePrediction.py#L220), [HookedTransformer.py:707-735](transformer_lens/HookedTransformer.py#L707).
+- **TransformerBridge**: ⚠️ same — bridge inherits `hook_points.py` machinery.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: nothing material; no new comments.
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~30 LoC across affected files — strip `*model_args`, keep only `**model_kwargs`. Already labeled `breaking-change`.
+
+<a id="issue-837"></a>
+
+#### #837 — Multi-GPU device ordinal issue (`n_devices=3` for llama2-7b)
+
+- **Issue**: `n_devices=3` produces "device ordinal out of range" — `(index // layers_per_device)` overshoots when `n_layers % n_devices != 0`.
+- **HookedTransformer**: 🐛 still buggy at [utilities/multi_gpu.py:142](transformer_lens/utilities/multi_gpu.py#L142) — `device_index = (device.index or 0) + (index // layers_per_device)` unchanged. The function is flagged `Deprecated: This will be removed in 3.0` ([line 130-133](transformer_lens/utilities/multi_gpu.py#L130)).
+- **TransformerBridge**: ✅ first-class — `resolve_device_map` at [multi_gpu.py:170](transformer_lens/utilities/multi_gpu.py#L170) with explicit `n_devices` / `device_map` / `max_memory` and accelerate-backed dispatch. jlarson4's comment on the issue points users to PR #1270.
+- **Replication**: `[unverifiable]` — no multi-GPU here.
+- **What changed since v3**: nothing material; bridge path remains the supported route.
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: HT-side bug remains. Reply on issue with bridge migration recipe (`bridge = TransformerBridge.boot_transformers(name, n_devices=3)`); leave HT path open for #968-family fix or close with bridge pointer if reporter migrates.
+
+<a id="issue-867"></a>
+
+#### #867 — Does TransformerLens support LVLM like Qwen2-VL?
+
+- **Issue**: User asks if Qwen2-VL / Qwen2.5-VL is supported.
+- **HookedTransformer**: ❌ no native VLM support.
+- **TransformerBridge**: ❌ `Qwen2VLForConditionalGeneration` and `Qwen2_5_VLForConditionalGeneration` still listed in [transformer_lens/tools/model_registry/data/architecture_gaps.json:4709,4940](transformer_lens/tools/model_registry/data/architecture_gaps.json#L4709). Multimodal set at [transformer_lens/utilities/architectures.py:31-36](transformer_lens/utilities/architectures.py#L31-L36) covers only Llava family + Gemma3.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: no movement on Qwen-VL adapters; no new comments since v3 either.
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: add `Qwen2VLArchitectureAdapter` (LLaVA-pattern). Continue pointing reporter at LLaVA adapters today and ExplorerFreda's vlm-lens fork.
+
+<a id="issue-869"></a>
+
+#### #869 — Custom generative video transformer
+
+- **Issue**: User wants mech interp on a Sora-like generative video diffusion transformer.
+- **HookedTransformer**: ❌ no diffusion / video generation support.
+- **TransformerBridge**: ❌ bridge wraps HF causal/seq2seq/multimodal text models via `original_model`; not designed for diffusion. No new diffusion entry in [transformer_lens/utilities/architectures.py](transformer_lens/utilities/architectures.py).
+- **Replication**: `[code-verified]`
+- **What changed since v3**: no activity on issue or relevant code.
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: outside current scope per Bryce's reply (would need a separate `HookedDiffusionTransformer` root module). Recommend close as wontfix or defer to architectural roadmap; point reporter to a dedicated diffusion-interp tool.
+
+<a id="issue-888"></a>
+
+#### #888 — Adapt HookedTransformer to a non-supported model (CLIP language model)
+
+- **Issue**: User wants `from_pretrained` for CLIP's text encoder.
+- **HookedTransformer**: ❌ not possible without code modifications.
+- **TransformerBridge**: ⚠️ adapter framework supports it but no `CLIPTextModel` adapter exists — no `CLIPText*` symbol anywhere under `transformer_lens/`. `CLIPVisionEncoderBridge` exists for the vision side via LLaVA. jlarson4's earlier comment already pointed reporter at the adapter-creation guide.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: no new comments; no CLIP text adapter landed.
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: write `CLIPTextModelArchitectureAdapter` (~120 LoC, encoder-only, BERT-like attention). Leave open as a focused model-request inviting community contribution.
+
+<a id="issue-953"></a>
+
+#### #953 — Add basic support for Gemma 3n (E2B & E4B)
+
+- **Issue**: Reporter asks for text-only support of Gemma 3n (AltUp / LAuReL / PLE / mixed local-global attention).
+- **HookedTransformer**: ❌ not supported.
+- **TransformerBridge**: ❌ no Gemma3n entry in [transformer_lens/utilities/architectures.py](transformer_lens/utilities/architectures.py); no Gemma3n symbol anywhere under `transformer_lens/`. Bryce confirmed in-progress for next major release.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: no movement on Gemma3n adapter.
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: track for milestone 3.x. AltUp/LAuReL/PLE need dedicated component bridges; mixed local/global attention can share Gemma2 work. Defer until HF's `Gemma3nForCausalLM` forward stabilizes.
+
+<a id="issue-1080"></a>
+
+#### #1080 — Import fails by default in Colab (numpy ABI mismatch)
+
+- **Issue**: Fresh Colab + `pip install transformer_lens` + `import transformer_lens` raises `numpy.dtype size changed` ABI error; kernel restart works around it.
+- **HookedTransformer**: ⚠️ [pyproject.toml:11-12](pyproject.toml#L11-L12) still has `numpy>=1.24` / `numpy>=1.26` lower bounds with no upper cap. Numpy 2.x is allowed; transitive ABI mismatch root cause unchanged.
+- **TransformerBridge**: ⚠️ same install path; same numpy.
+- **Replication**: `[unverifiable]` — Colab-specific.
+- **What changed since v3**: no movement on numpy pinning; no new comments.
+- **Bucket**: `bug-likely-fixed-needs-verification`
+- **Next step**: ask reporter to retest with current Colab kernel + current TL (3.x). If still failing, bisect transitive deps and pin a tested numpy.
+
+<a id="issue-1148"></a>
+
+#### #1148 — Tutorial for "Real-Time Training Dynamics" (VSM Telemetry)
+
+- **Issue**: Reporter proposes a demo notebook for σ_p / σ_a training-dynamics telemetry.
+- **HookedTransformer**: ❌ no VSM/sigma_p/sigma_a tutorial in [demos/](demos/) — no VSM symbol anywhere under `demos/` or `transformer_lens/`.
+- **TransformerBridge**: ❌ same — works equivalently against bridge's hook system.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: jonathanrbelanger-lang committed in-thread to "get to work on this over the coming weekend" but no PR yet; no new commits to `demos/` referencing VSM telemetry.
+- **Bucket**: `not-addressed-simple`
+- **Next step**: leave open and wait for the reporter's PR (notebook in `/demos`, targeting `TransformerBridge`). If no PR materializes within a release cycle, invite community contribution and close as wontfix.
+
+<a id="issue-1263"></a>
+
+#### #1263 — Direct Logit Attribution Tool
+
+- **Issue**: Add a first-class DLA helper in `transformer_lens/tools/analysis/direct_logit_attribution.py` for the new `TransformerBridge` system. Continuation of stale PR #466 (closed 2026-04-22).
+- **HookedTransformer**: ⚠️ partial — `ActivationCache.logit_attrs` exists at [transformer_lens/ActivationCache.py:488-606](transformer_lens/ActivationCache.py#L488-L606) but no standalone tool that wraps the full DLA flow (residual decomposition → scaled attribution → display).
+- **TransformerBridge**: ⚠️ uses the same `ActivationCache.logit_attrs`, but no dedicated bridge-friendly tool. `transformer_lens/tools/` has only `model_registry/`; no `analysis/` subpackage exists yet.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: PR #466 was closed (2026-04-22) the same day issue #1263 was opened — explicitly creating the issue as a replacement scope. No PR yet.
+- **Bucket**: `not-addressed-simple`
+- **Labels**: enhancement / good first issue / help wanted / minor / complexity-moderate
+- **Next step**: create `transformer_lens/tools/analysis/direct_logit_attribution.py` wrapping `cache.logit_attrs` + residual-stack decomposition into a one-call API; ship with a demo notebook. Already labelled `good first issue` — invite contributor.
+
+<a id="issue-1280"></a>
+
+#### #1280 — Add support for `cpu`, `meta`, and `disk` to TransformerBridge `device_map`
+
+- **Issue**: Extend bridge `device_map` to allow `cpu` / `meta` / `disk` values. Currently rejected. Pairs with #872 (broader review) and #1270 (initial multi-device).
+- **HookedTransformer**: N/A — separate device-placement model.
+- **TransformerBridge**: 🐛 still rejected by design at [transformer_lens/utilities/multi_gpu.py:146-167](transformer_lens/utilities/multi_gpu.py#L146-L167): `_UNSUPPORTED_DEVICE_MAP_VALUES = {"cpu", "disk", "meta"}` validated in `_validate_device_map_values`, also blocked post-load at [transformer_lens/model_bridge/sources/transformers.py:559-566](transformer_lens/model_bridge/sources/transformers.py#L559-L566). Reporter's identified blocker is the dtype-cast loop.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: snakefood3232 volunteered with a 3-day PR estimate (skip meta-device params, use accelerate's `align_module_device`); jlarson4 assigned them on 2026-05-05.
+- **Bucket**: `partial-leave-open`
+- **Next step**: wait for snakefood3232's PR — concrete fix plan documented. Reviewer: relax `_UNSUPPORTED_DEVICE_MAP_VALUES`, gate the dtype-cast loop on `param.device.type != "meta"`, and exercise via integration test that loads a small model with `device_map={"": "cpu"}`.
+
+<a id="issue-1291"></a>
+
+#### #1291 — CI HuggingFace Call Reduction
+
+- **Issue**: CI optimization — reduce HF Hub round-trips during test runs to avoid 429 rate-limit failures across concurrent CI runs.
+- **HookedTransformer**: ⚠️ partial — [.github/workflows/checks.yml:65-88,246-269](.github/workflows/checks.yml#L65-L88) caches ~14 model dirs across `compatibility-checks` and `coverage-test`, but no `concurrency` group is configured anywhere in the workflow; many tests still call `from_pretrained` per-test rather than via session fixtures.
+- **TransformerBridge**: ⚠️ same — bridge tests under `tests/integration/model_bridge/` and `tests/acceptance/model_bridge/` share the cache but each conftest re-loads HF models.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: ak91456 volunteered on 2026-05-09; no PR yet. Cache key bumped to `huggingface-models-v4` recently but core fixture/concurrency work hasn't started.
+- **Bucket**: `partial-leave-open`
+- **Labels**: enhancement / good first issue / low-priority / complexity-moderate
+- **Next step**: wait for ak91456's PR. Suggested approach: (a) add `concurrency: { group: ${{ github.workflow }}-${{ github.ref }}, cancel-in-progress: true }` to `checks.yml` to dedupe stacked runs; (b) promote per-file `from_pretrained("gpt2")` calls in conftests to session-scoped fixtures.
+
+<a id="issue-1297"></a>
+
+#### #1297 — Gemma4 Architecture Adapter
+
+- **Issue**: Add a `Gemma4ArchitectureAdapter` for the new Gemma4 family. Currently surfaces in `architecture_gaps.json` with relevancy 88.0 (109 models on HF, 121k cumulative downloads).
+- **HookedTransformer**: N/A — bridge-only path going forward; no HT weight conversion expected.
+- **TransformerBridge**: ❌ no `Gemma4ArchitectureAdapter` in [transformer_lens/model_bridge/supported_architectures/](transformer_lens/model_bridge/supported_architectures/); `Gemma4ForConditionalGeneration` not registered in [factories/architecture_adapter_factory.py](transformer_lens/factories/architecture_adapter_factory.py) or `HF_SUPPORTED_ARCHITECTURES`.
+- **Replication**: `[code-verified]` — confirmed adapter and registration entries are absent.
+- **Bucket**: `not-addressed-simple`
+- **Next step**: copy `gemma3.py` adapter as starting template (Gemma4 is most likely a Gemma3 superset); register in factory + `HF_SUPPORTED_ARCHITECTURES` + `CANONICAL_AUTHORS_BY_ARCH` (`google`); follow `docs/source/content/adapter_development/adapter-creation-guide.md`. Then verify on the canonical Google models.
+
+<a id="issue-1298"></a>
+
+#### #1298 — External Architecture Registration
+
+- **Issue**: Let users register custom architecture adapters at runtime without modifying TransformerLens source. Currently `SUPPORTED_ARCHITECTURES` in `architecture_adapter_factory.py` is hardcoded.
+- **HookedTransformer**: N/A — bridge-only concept (HT loads via `OFFICIAL_MODEL_NAMES`, no plugin hook).
+- **TransformerBridge**: ❌ no public registration API. The `SUPPORTED_ARCHITECTURES` dict at [factories/architecture_adapter_factory.py:65](transformer_lens/factories/architecture_adapter_factory.py#L65) is module-level and not user-mutable through any documented mechanism.
+- **Replication**: `[code-verified]` — no `register_adapter` function or plugin entry-point hook.
+- **Bucket**: `not-addressed-simple`
+- **Next step**: design needed first — entry-point-based discovery vs. explicit `register_adapter(arch_name, adapter_class)` function. Adapter-creation-guide already exists, so the second-half (publishing your adapter) is the remaining gap.
+
+
+<a id="issue-1302"></a>
+
+#### #1302 — Additional Architecture Adapter tests
+
+- **Issue**: Roughly a third of registered architecture adapters have dedicated config/component-mapping tests in [tests/unit/model_bridge/supported_architectures/](tests/unit/model_bridge/supported_architectures/); the rest lack focused coverage. Existing tests (baichuan / codegen / cohere / gemma3 / gpt_bigcode / internlm2 / llava / mpt / qwen3_5 / qwen3_moe / qwen3_next / xglm / gemma3_multimodal) serve as the pattern to mirror.
+- **HookedTransformer**: N/A — bridge-only concern.
+- **TransformerBridge**: ❌ partial coverage. 13 adapter test files exist (as of this branch); remaining adapters under [transformer_lens/model_bridge/supported_architectures/](transformer_lens/model_bridge/supported_architectures/) are uncovered.
+- **Replication**: `[code-verified]` — counted adapters in `supported_architectures/` vs. test files; gap confirmed.
+- **Bucket**: `not-addressed-simple`
+- **Labels**: enhancement / good first issue / help wanted / low-priority / complexity-simple / TransformerBridge
+- **Next step**: identify the uncovered adapters, then mirror the pattern from any of the existing 13 — Config / ComponentMapping / WeightConversions / ArchitectureGuards classes with class-scoped fixtures. Each adapter is independent, so this parallelizes well across contributors.
diff --git a/OPEN_ISSUES_TRIAGE.v3.md b/OPEN_ISSUES_TRIAGE.v3.md
new file mode 100644
index 000000000..f29d0dae5
--- /dev/null
+++ b/OPEN_ISSUES_TRIAGE.v3.md
@@ -0,0 +1,639 @@
+# Open Issues Triage (v3)
+
+**Generated:** 2026-05-08 (v3 — re-verified against current code)
+**Repo:** TransformerLensOrg/TransformerLens
+**Open issues:** 48 (44 re-verified from v2 + 4 opened since)
+**v2 archived at:** [OPEN_ISSUES_TRIAGE_OLD.md](OPEN_ISSUES_TRIAGE_OLD.md)
+
+## What changed since v2
+
+- **37 issues closed** during the v2 cycle (archived in [OPEN_ISSUES_TRIAGE_OLD.md](OPEN_ISSUES_TRIAGE_OLD.md))
+- **44 entries re-verified** against current code; ~9 had material updates from PRs landed in the v2 cycle
+- **4 new entries** added with stub triage (#1263, #1275, #1280, #1291)
+
+### Newly closeable based on v3 re-verification
+
+- **#729** — adapter-creation guide landed (PR #1274)
+- **#846** — bridge `hf_model.config` priority fixed (PR #1279)
+- **#912** — mT5 wired through bridge with full verification (PR #1289)
+- **#950** — SimpleStories family verified end-to-end (PR #1292)
+- **#1133** — already covered-close in v2; v3 refreshes citation
+
+### Material code-state changes since v2 (still open, but verdict updated)
+
+- **#290** — empty-name circular reference confirmed fixed at `hook_points.py:420-421`
+- **#483** — HT side fixed by PR #1267; bridge mirror still needed
+- **#569 / #684** — bridge quantization had a real bug (uint8 cast) fixed by PR #1276; multi-device by PR #1270
+- **#615** — PR #1276 dtype-cast fix benefits non-quantized too via shared `GeneralizedComponent`
+- **#661** — bridge now exposes `set_use_split_qkv_input`
+- **#837 / #911 / #968** — multi-device bridge (PR #1270) now merged on main
+- **#1148** — reporter committed to building tutorial
+
+The v2 methodology section (HT-side / Bridge-side / Replication / Next step) still applies — see [OPEN_ISSUES_TRIAGE_OLD.md](OPEN_ISSUES_TRIAGE_OLD.md#methodology-per-issue).
+
+## Summary table (sorted by issue number)
+
+| Issue | Title | Bucket |
+|---|---|---|
+| #111 | [Demo of direct path patching](#issue-111) | `not-addressed-difficult` |
+| #112 | [Helper to display vectors of logits nicely](#issue-112) | `not-addressed-simple` |
+| #210 | [`get_full_resid_decomposition` accept tensor argument](#issue-210) | `not-addressed-simple` |
+| #290 | [GPU memory leak when HookedTransformer goes out of scope](#issue-290) | `partial-leave-open` |
+| #297 | [Better print-outs for currently attached hooks](#issue-297) | `not-addressed-simple` |
+| #341 | [Update FactoredMatrix.svd() (uses deprecated `torch.svd`, returns V not Vh)](#issue-341) | `not-addressed-simple` |
+| #385 | [Pythia / Rotary Embeddings don't match HuggingFace](#issue-385) | `bug-still-reproduces` |
+| #453 | [`from_pretrained()` always downloads same weights with `checkpoint_label`](#issue-453) | `bug-likely-fixed-needs-verification` |
+| #462 | [Add support for Mamba](#issue-462) | `fixed-on-transformerbridge` |
+| #479 | [Memory efficient causal mask implementation](#issue-479) | `partial-leave-open` |
+| #481 | [Tracr to TransformerLens demo broken](#issue-481) | `bug-still-reproduces` |
+| #483 | [`HookedTransformer.generate()` `pad_token_id` error when tokenizer unset](#issue-483) | `partial-leave-open` |
+| #509 | [LayerNorm folding not implemented for BertBlock](#issue-509) | `not-addressed-difficult` |
+| #543 | [Grokking demo broken in Colab](#issue-543) | `bug-likely-fixed-needs-verification` |
+| #569 | [Cannot load Llama 3 70B on multigpu in 4bit](#issue-569) | `fixed-on-transformerbridge` |
+| #588 | [Setup unit tests to cover model configurations](#issue-588) | `partial-leave-open` |
+| #595 | [Add Stopping Criteria support](#issue-595) | `not-addressed-simple` |
+| #615 | [HookedTransformer output not identical to HuggingFace for Llama 3](#issue-615) | `fixed-on-transformerbridge` |
+| #644 | [Documentation: Map the Act Names to the Transformer](#issue-644) | `not-addressed-simple` |
+| #661 | [Pythia output inconsistent across batch sizes with `use_split_qkv_input=True`](#issue-661) | `bug-still-reproduces` |
+| #684 | [Expand quantization model support beyond Llama](#issue-684) | `fixed-on-transformerbridge` |
+| #697 | [Activation cache during generate](#issue-697) | `not-addressed-simple` |
+| #704 | [Add support for TracrBench](#issue-704) | `not-relevant-close` |
+| #710 | [MVP Support For 1-2 Models Per-Modality](#issue-710) | `not-addressed-difficult` |
+| #720 | [Review current matmul function usages](#issue-720) | `partial-leave-open` |
+| #729 | [Guide to adding new models](#issue-729) | `covered-close` |
+| #737 | [Q reshape with model loaded in 4bit](#issue-737) | `partial-leave-open` |
+| #773 | [TransformerLens on models with different layernorm placement (BioGPT)](#issue-773) | `not-addressed-difficult` |
+| #796 | [`FactoredMatrix.svd()` `lru_cache` prevents GC](#issue-796) | `not-addressed-simple` |
+| #798 | [Remove `model_args` (use only `model_kwargs`)](#issue-798) | `not-addressed-simple` |
+| #830 | [Type hint support for `self.model` in `ActivationCache`](#issue-830) | `not-addressed-simple` |
+| #837 | [Multi-GPU device ordinal issue (`n_devices=3` for llama2-7b)](#issue-837) | `fixed-on-transformerbridge` |
+| #846 | [Prioritize local `hf_model.config` for Qwen models](#issue-846) | `fixed-on-transformerbridge` |
+| #867 | [Does TransformerLens support LVLM like Qwen2-VL?](#issue-867) | `not-addressed-difficult` |
+| #869 | [Custom generative video transformer](#issue-869) | `not-addressed-difficult` |
+| #888 | [Adapt HookedTransformer to a non-supported model (CLIP language model)](#issue-888) | `not-addressed-difficult` |
+| #911 | [PosEmbed device error with `accelerate`](#issue-911) | `fixed-on-transformerbridge` |
+| #912 | [Support mT5 models](#issue-912) | `covered-close` |
+| #950 | [Support SimpleStories models](#issue-950) | `covered-close` |
+| #953 | [Add basic support for Gemma 3n (E2B & E4B)](#issue-953) | `not-addressed-difficult` |
+| #968 | [`unsloth/llama-3.2-3b-instruct` with 2× 3060 device-mismatch](#issue-968) | `bug-likely-fixed-needs-verification` |
+| #1080 | [Import fails by default in Colab (numpy ABI mismatch)](#issue-1080) | `bug-likely-fixed-needs-verification` |
+| #1133 | [`tokenize_and_concatenate` cuts tokens mid-document](#issue-1133) | `covered-close` |
+| #1148 | [Tutorial for "Real-Time Training Dynamics" (VSM Telemetry)](#issue-1148) | `not-addressed-simple` |
+| #1263 | [Direct Logit Attribution Tool](#issue-1263) | `needs-triage` (new) |
+| #1275 | [Update Benchmarks & Verify Models to support Quantized models](#issue-1275) | `needs-triage` (new) |
+| #1280 | [Add support for `cpu`, `meta`, and `disk` to TransformerBridge `device_map`](#issue-1280) | `needs-triage` (new) |
+| #1291 | [CI HuggingFace Call Reduction](#issue-1291) | `needs-triage` (new) |
+
+## Per-issue entries
+
+<a id="issue-111"></a>
+
+#### #111 — Demo of direct path patching
+
+- **Issue**: Add a section to Exploratory Analysis Demo demonstrating direct path patching for all head pairs. PR #49 was an early attempt.
+- **HookedTransformer**: still no first-class path-patching helper. `demos/Activation_Patching_in_TL_Demo.ipynb` and `demos/Attribution_Patching_Demo.ipynb` exist but neither covers direct path patching.
+- **TransformerBridge**: same — no path-patching primitive in either API.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: callum mcdougall pointed users at the [ARENA IOI notebook](https://colab.research.google.com/drive/1KgrEwvCKdX-8DQ1uSiIuxwIiwzJuQ3Gw). Either close with a docs pointer to ARENA, or implement a TL helper that wraps the pattern (~80 LoC).
+
+<a id="issue-112"></a>
+
+#### #112 — Helper to display vectors of logits nicely
+
+- **Issue**: Neel asked for two things: **MVP** — function mapping logit vector → pandas DataFrame `(token_index, token_string, logit, log_prob, probability)`. **Bonus** — nostalgebraist-style `plot_logit_lens` heatmap.
+- **HookedTransformer**: `test_prompt` in [transformer_lens/utilities/exploratory_utils.py:14](transformer_lens/utilities/exploratory_utils.py#L14) prints top-k for prompt+answer — partial spirit of the MVP but print-only, single-position. No `logits_to_df`, no `plot_logit_lens` heatmap. Unchanged since v2.
+- **TransformerBridge**: same — `test_prompt` works through bridge; no separate helper.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~30 LoC for `logits_to_df(logits, tokenizer, top_k=None) -> pd.DataFrame`, ~50 LoC for matplotlib `plot_logit_lens`. Both small library additions independent of CircuitsVis.
+
+<a id="issue-210"></a>
+
+#### #210 — `get_full_resid_decomposition` accept tensor argument
+
+- **Issue**: Add a `project_output_onto: [d_model]` or `[d_model, num_outputs]` argument so neuron-decomposition doesn't blow GPU memory by materializing `[batch, pos, d_mlp, d_model]`.
+- **HookedTransformer**: signature at [transformer_lens/ActivationCache.py:1091](transformer_lens/ActivationCache.py#L1091) still has no `project_output_onto`. Memory-blowing path still active.
+- **TransformerBridge**: same — bridge imports the same `ActivationCache` class ([transformer_lens/model_bridge/bridge.py:34](transformer_lens/model_bridge/bridge.py#L34)).
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: add `project_output_onto` kwarg + `(neurons * (W_out @ project_output_onto))` path. ~15 LoC + 1 test. Alan Cooney offered to take it; never landed.
+
+<a id="issue-290"></a>
+
+#### #290 — GPU memory leak when HookedTransformer goes out of scope
+
+- **Issue**: `del model; gc.collect(); torch.cuda.empty_cache()` doesn't reclaim memory after loading multiple models in a loop.
+- **HookedTransformer**: the empty-name circular reference is now fixed at [transformer_lens/hook_points.py:420-421](transformer_lens/hook_points.py#L420-L421) (`if name == "": continue`). However, the `state_dict[k] = v.to(device)` non-detach concern from the thread is not visibly addressed in current `HookedTransformer.py`.
+- **TransformerBridge**: PR #1229 (`4cbb0f88`) fixed a *separate* Joint-QKV bridge memory leak (deepcopy bug) — unrelated to this issue. Bridge still delegates to HF; no TL-specific circular refs.
+- **Replication**: `[unverifiable]` — needs GPU profiling tooling and ~10× model loads.
+- **What changed since v2**: confirmed circular-reference fix is in place; v2 was uncertain.
+- **Bucket**: `partial-leave-open`
+- **Next step**: re-run `fil-profile` reproduction on current `dev`. If residual leak exists, focus on `move_model_modules_to_device` overlap with multi-GPU bug cluster (#837/#907/#911/#968).
+
+<a id="issue-297"></a>
+
+#### #297 — Better print-outs for currently attached hooks
+
+- **Issue**: API for listing hooks attached to a model + HookPoint, with detail.
+- **HookedTransformer**: no first-class `model.list_hooks()` or `HookPoint.describe()` API. `model.hook_dict` publicly accessible; `hp.fwd_hooks`/`bwd_hooks` inspectable. Confirmed via grep — no `list_active_hooks` in [transformer_lens/hook_points.py](transformer_lens/hook_points.py).
+- **TransformerBridge**: same — uses same `hook_points` machinery.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: add `HookedRootModule.list_active_hooks()` returning `Dict[str, List[hook_repr]]`. ~15 LoC + 1 test. Abandoned PR #302 was the prior attempt.
+
+<a id="issue-341"></a>
+
+#### #341 — Update FactoredMatrix.svd() (uses deprecated `torch.svd`, returns V not Vh)
+
+- **Issue**: TL uses deprecated `torch.svd` (which returns V, not Vh) inside `FactoredMatrix.svd`. Should switch to `torch.linalg.svd` and return Vh per modern convention.
+- **HookedTransformer/Bridge**: confirmed at [transformer_lens/FactoredMatrix.py:230-233](transformer_lens/FactoredMatrix.py#L230-L233) — still `torch.svd(...)`. Last commit on file was `90cf7476` (eigenvalues type fix), not relevant. No fix landed since v2.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~15-line fix — switch to `torch.linalg.svd(..., full_matrices=False)`, return `Vh` directly, update docstring noting the breaking change. `diego898` offered to send PR. Land with a deprecation warning.
+
+<a id="issue-385"></a>
+
+#### #385 — Pythia / Rotary Embeddings don't match HuggingFace
+
+- **Issue**: Logit drift between `HookedTransformer` and HF for Pythia models. Llama-2-7b-chat reportedly catastrophic. Llama-3.2 rotary mismatch persists per chengjiali.
+- **HookedTransformer**: rotary code lives at [transformer_lens/components/abstract_attention.py:599](transformer_lens/components/abstract_attention.py#L599). Last touched by PR #1218 (`2c41b6c9` Weight processing/position embeddings attention) and PR #1231 (`524bca93` rotary_base types). No new pythia-specific fixes since v2.
+- **TransformerBridge**: bridge uses HF's rotary directly via `RotaryEmbeddingBridge` delegating to `model.rotary_emb`. By construction matches HF.
+- **Replication**: `[empirically replicated]` per v2 (NaN logits in fp32 baseline). Not re-run this round.
+- **Bucket**: `bug-still-reproduces` + `fixed-on-transformerbridge` for bridge users
+- **Next step**: investigate the v2-reported NaN regression — verify whether it persists with full `from_pretrained` on current HEAD; bisect against `2c41b6c9`. Bridge users avoid this entirely.
+
+<a id="issue-453"></a>
+
+#### #453 — `from_pretrained()` always downloads same weights with `checkpoint_label`
+
+- **Issue**: Reporter passes `checkpoint_label=...` and gets identical weights regardless of label. `checkpoint_index` works.
+- **HookedTransformer**: signature at [transformer_lens/HookedTransformer.py:1158-1159](transformer_lens/HookedTransformer.py#L1158-L1159) has `checkpoint_index` and `checkpoint_value` — **NOT `checkpoint_label`**. The kwarg is silently absorbed into `**from_pretrained_kwargs`. Unchanged since v2.
+- **TransformerBridge**: no checkpoint feature — uses HF's native loading only.
+- **Replication**: `[code-verified]`
+- **Bucket**: `bug-likely-fixed-needs-verification` (effectively user-error)
+- **Next step**: respond to reporter that the parameter is `checkpoint_value`. Optionally validate unknown kwargs in `from_pretrained` and raise. ~10 LoC defensive change.
+
+<a id="issue-462"></a>
+
+#### #462 — Add support for Mamba
+
+- **Issue**: Add Mamba SSM architecture support.
+- **HookedTransformer**: not supported (by design — Mamba is fundamentally different from attention transformers).
+- **TransformerBridge**: `MambaArchitectureAdapter` and `Mamba2ArchitectureAdapter` registered at [transformer_lens/factories/architecture_adapter_factory.py:95-96](transformer_lens/factories/architecture_adapter_factory.py#L95-L96). Both `MambaForCausalLM` and `Mamba2ForCausalLM` HF model classes mapped. SSM beta support landed via PR #1246 (`7cf84596`).
+- **Replication**: `[code-verified]`
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: close with comment pointing at `TransformerBridge.boot_transformers("state-spaces/mamba-130m-hf")`. Mamba support is shipped.
+
+<a id="issue-479"></a>
+
+#### #479 — Memory efficient causal mask implementation
+
+- **Issue**: Each `Attention` layer registers a `(n_ctx, n_ctx)` boolean `causal_mask` buffer. ~86 GB overhead at Qwen 72B × 32K ctx.
+- **HookedTransformer**: confirmed at [transformer_lens/components/abstract_attention.py:120-128](transformer_lens/components/abstract_attention.py#L120-L128) — `causal_mask = torch.tril(torch.ones((self.cfg.n_ctx, self.cfg.n_ctx)).bool())` and `register_buffer("mask", causal_mask)`. Bug as reported still present for ALL HT architectures.
+- **TransformerBridge**: architecture-dependent (per v2). GPT2-family inherits HF's static `(max_pos, max_pos)` buffer. Modern HF impls (GPTNeoX/Pythia/Llama/Qwen/Mistral/Gemma) use `_update_causal_mask` per forward — zero overhead. The motivating Qwen 72B case is fixed on bridge.
+- **Replication**: `[empirically replicated]` per v2.
+- **Bucket**: `partial-leave-open`
+- **Next step**: bridge users on modern architectures already have the desired memory profile. HT-side fix (~30 LoC: replace pre-allocated buffer with on-the-fly construction in `apply_causal_mask`) closes it for the legacy path and GPT2-family use cases.
+
+<a id="issue-481"></a>
+
+#### #481 — Tracr to TransformerLens demo broken
+
+- **Issue**: Demo notebook assumes "the unembed is a projection onto the first few elements of the residual stream" — wrong because Tracr re-orders the residual stream alphabetically. Needs Tracr upstream PR to expose the unembed matrix.
+- **HookedTransformer**: confirmed at [demos/Tracr_to_Transformer_Lens_Demo.ipynb:233](demos/Tracr_to_Transformer_Lens_Demo.ipynb) — `sd["unembed.W_U"] = np.eye(d_model, d_vocab_out)` line still there. Demo NOT ported to TransformerBridge. No commits on the notebook since `7784be1c` (IPython magic deprecation, unrelated).
+- **TransformerBridge**: same — Tracr-specific issue applies regardless of API; bug is in unembed-matrix derivation, not in TL's hook system.
+- **Replication**: `[code-verified]`
+- **Bucket**: `bug-still-reproduces`
+- **Next step**: needs Tracr upstream PR to expose `unembed_matrix` in `tracr.params`. FlyingPumba said they'd attempt the upstream change. Without that, demo is fundamentally limited.
+
+<a id="issue-483"></a>
+
+#### #483 — `HookedTransformer.generate()` `pad_token_id` error when tokenizer unset
+
+- **Issue**: `model.generate()` on a `HookedTransformer` with no tokenizer raises `AttributeError: 'NoneType' object has no attribute 'pad_token_id'`. Use case: training models on tokenizer-less domains (e.g., character-level integer addition).
+- **HookedTransformer**: ✅ fixed by PR #1267 (commit `b1cc8c80`, "Fix generate() when tokenizer is unset and add regression tests"). The `assert self.tokenizer is not None` was removed from the top of both `generate()` and `generate_stream()`; logic now branches on `tokenizer_has_eos_token` and falls back to user-supplied `eos_token_id`. See [transformer_lens/HookedTransformer.py:2068-2089](transformer_lens/HookedTransformer.py#L2068-L2089). Regression test at [tests/unit/test_generate_no_tokenizer.py](tests/unit/test_generate_no_tokenizer.py).
+- **TransformerBridge**: ❌ not fixed. [transformer_lens/model_bridge/bridge.py:2550-2566](transformer_lens/model_bridge/bridge.py#L2550-L2566) and the parallel block at L2826-L2839 still dereference `self.tokenizer.eos_token_id` unguarded; same gap as v2. The mirror-to-bridge expectation was not met when #1267 landed.
+- **Replication**: `[code-verified]`
+- **What changed since v2**: PR #1267 landed on HT only; bridge side was not mirrored.
+- **Bucket**: `partial-leave-open`
+- **Next step**: mirror the #1267 fix to `TransformerBridge.generate` and `generate_stream` (~10 LoC: guard `self.tokenizer is not None` before the eos/pad lookups, accept None tokenizer when `eos_token_id` is supplied). Once bridge-side regression test exists, close.
+
+<a id="issue-509"></a>
+
+#### #509 — LayerNorm folding not implemented for BertBlock
+
+- **Issue**: BertBlock uses post-norm; `fold_ln=True` would fold LN into Q/K/V which is mathematically incorrect for post-norm.
+- **HookedTransformer**: 🐛 architectural limitation per Neel ("LayerNorm should not be folded at all... I can't think of any way to do LayerNorm folding for Bert"). [`HookedEncoder.from_pretrained`](transformer_lens/HookedEncoder.py#L412) already hardcodes `fold_ln=False` so silent-wrong-result is averted, but a user calling lower-level `from_pretrained(..., fold_ln=True)` on a BERT model still gets undefined behavior. `BertBlock` lives at [transformer_lens/components/bert_block.py:19](transformer_lens/components/bert_block.py#L19).
+- **TransformerBridge**: ⚠️ `BertArchitectureAdapter` exists; `enable_compatibility_mode()` would inherit the same fold-doesn't-work problem. Bridge users typically don't fold LN regardless.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: Two options unchanged from v2 — either close as wontfix (Neel's view) or add a 5-line warning when `fold_ln=True` is passed for a BERT-family architecture. Current hardcode at HookedEncoder.py:412 is sufficient for the standard path; the warning would catch the pathological lower-level call.
+
+<a id="issue-543"></a>
+
+#### #543 — Grokking demo broken in Colab
+
+- **Issue**: `loss_fn(all_logits, labels)` raises `RuntimeError: Size does not match at dimension 0 expected index [12769, 1] to be smaller than self [113, 113]`.
+- **HookedTransformer**: ⚠️ unverified. `demos/Grokking_Demo.ipynb` last touched in [`98811df5 3.0 CI Bugs (#1261)`](demos/Grokking_Demo.ipynb); no commit referencing #543 directly. anthonyduong9 said in 2024 "I can work on this today" but no PR linked.
+- **TransformerBridge**: N/A — demo-specific shape bug.
+- **Replication**: `[unverifiable]` — needs Colab-like environment to run the full notebook end-to-end.
+- **Bucket**: `bug-likely-fixed-needs-verification`
+- **Next step**: ask reporter (or anthonyduong9) to re-run the notebook on current `dev` and confirm whether the original error reproduces. If it does, the fix is in `loss_fn` (`per_token_logprobs` shape vs. `labels` shape mismatch — most likely a `.unsqueeze(-1)` missing or extra). If it doesn't, close.
+
+<a id="issue-569"></a>
+
+#### #569 — Cannot load Llama 3 70B on multigpu in 4bit
+
+- **Issue**: `HookedTransformer.from_pretrained(..., hf_model=base_model)` fails with `size mismatch for blocks.0.attn._W_K: copying a param with shape torch.Size([4194304, 1])`. BnB packs weights as 1D blobs; HT's QKV reshape doesn't unpack them.
+- **HookedTransformer**: 🐛 unchanged — HT load path doesn't unpack BnB-quantized weights before reshape.
+- **TransformerBridge**: ✅ now meaningfully fixed for both halves of the original problem. (1) **Multi-GPU**: PR #1270 (`d95bd962`, "Multi-Device Processing on Bridge") added `n_devices` and `device_map` kwargs to `TransformerBridge.boot_transformers` — see [transformer_lens/model_bridge/bridge.py:195-230](transformer_lens/model_bridge/bridge.py#L195-L230). (2) **Quantization**: PR #1276 (`0a5218ca`, "Fixed Quantization bug in TransformerLens 3.0") repaired an `AttentionBridge`/`GeneralizedComponent` dtype-cast bug where bridge cast fp inputs to the storage dtype (uint8 for BnB Params4bit) of quantized first-parameters, producing gibberish logits. j.larson's recent comment on the issue confirms: "If you migrate to TransformerLens 3.0, there is a demo for how to run Llama 3 4-bit with the new system."
+- **Replication**: `[code-verified]`
+- **What changed since v2**: PRs #1270 and #1276 landed (multi-device bridge support + fixed quantization on bridge). v2 marked bridge as structurally sound but unverified for 4bit; bridge had a real bug that's now repaired.
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: HT-side fix would still require BnB-aware QKV reshape (~50 LoC) but is no longer the only path. Reply on issue pointing at the [Llama-2 quantized demo](demos/LLaMA2_GPU_Quantized.ipynb) and the migration guide; once a user confirms 4bit + multi-GPU end-to-end on bridge, close.
+
+<a id="issue-588"></a>
+
+#### #588 — Setup unit tests to cover model configurations
+
+- **Issue**: Add unit tests that load every supported model's config and verify it's parseable.
+- **HookedTransformer/Bridge**: ⚠️ partial — same as v2. Per-architecture coverage at `tests/unit/test_gemma3_config.py`, `test_hooked_transformer_config.py`, `test_llava_config.py`, `test_qwen3_5_adapter.py`, `test_gemma3_multimodal_adapter.py` plus structural tests under [tests/unit/model_bridge/supported_architectures/](tests/unit/model_bridge/supported_architectures/) (7 adapter test files). No single parametrized sweep over the full `SUPPORTED_ARCHITECTURES` keyset.
+- **Replication**: `[code-verified]`
+- **Bucket**: `partial-leave-open`
+- **Next step**: ~30 LoC parametrized test over all `SUPPORTED_ARCHITECTURES` keys: for each, load config-only (no weights) and assert the architecture adapter resolves. Curt-tigges signed up in 2024 without a PR; could now also be assigned to whoever next adds an adapter (forces the pattern for new entries too).
+
+<a id="issue-595"></a>
+
+#### #595 — Add Stopping Criteria support
+
+- **Issue**: HF offers `StoppingCriteria` for custom halt conditions; HT/bridge `generate()` only support `stop_at_eos`.
+- **HookedTransformer**: ❌ unchanged — [transformer_lens/HookedTransformer.py:1882](transformer_lens/HookedTransformer.py#L1882) `generate()` still only takes `stop_at_eos: bool`. Same at `generate_stream()` line 2262.
+- **TransformerBridge**: ❌ unchanged — [transformer_lens/model_bridge/bridge.py:2433](transformer_lens/model_bridge/bridge.py#L2433) `generate()` and `generate_stream()` (line 2743) only have `stop_at_eos`.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~30 LoC — add `stopping_criteria: Optional[Callable[[tokens, logits], bool]] = None` to all four entry points (HT generate/generate_stream, bridge generate/generate_stream); evaluate after each sampled token and break if any returns True. srishti-git1110 volunteered in 2024.
+
+<a id="issue-615"></a>
+
+#### #615 — HookedTransformer output not identical to HuggingFace for Llama 3
+
+- **Issue**: Greedy decoding diverges between HT and HF on Llama-3-8B-Instruct. Investigation localized to MLP weight differences after einsum/Linear conversion.
+- **HookedTransformer**: ⚠️ much improved — most einsums in attention/MLP replaced with `F.linear` (visible at [transformer_lens/components/abstract_attention.py:368-374](transformer_lens/components/abstract_attention.py#L368-L374)). degenfabian reports max diff ~`2e-4` on Llama-3-8B-Instruct; close enough for production but not bit-exact. Per-architecture reports on Gemma 2-2B etc. continue.
+- **TransformerBridge**: ✅ argmax/CE/generation parity with HF achieved. **Important update**: PR #1276 fixed a real precision-killing bug in `AttentionBridge` where the dtype-cast logic returned the storage dtype of quantized parameters; the fix also benefits non-quantized models because the same `target_dtype = next(parameters()).dtype` codepath was used. Bridge does its own attention math (`torch.matmul` + softmax + mask in [generalized_components/joint_qkv_attention.py:465-480](transformer_lens/model_bridge/generalized_components/joint_qkv_attention.py#L465-L480)). Empirically, Pythia-70m bridge vs HF: ~`2.5e-3` max drift, argmax matches.
+- **Replication**: `[empirically replicated]` — bridge gives small drift but argmax-matches HF on Pythia-70m (per the v2 measurement).
+- **What changed since v2**: PR #1276 fixed a quantization-storage dtype-cast bug in `GeneralizedComponent` that was silently degrading attention precision for any model where the first attention parameter happened to have non-fp dtype.
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: bridge users get argmax/CE/generation parity with HF. Bit-exact match still depends on (a) `attn_implementation="eager"` vs HF default sdpa, (b) softmax dtype/order, (c) `.contiguous()` calls. For interp uses bridge is sufficient; for bit-exact circuit reproduction, document the known eager-vs-sdpa caveat in [docs/source/content/migrating_to_v3.md](docs/source/content/migrating_to_v3.md).
+
+<a id="issue-644"></a>
+
+#### #644 — Documentation: Map the Act Names to the Transformer
+
+- **Issue**: Add a labeled diagram mapping hook names to positions on a transformer architecture figure.
+- **HookedTransformer/Bridge**: ❌ unchanged — [docs/source/content/model_structure.md](docs/source/content/model_structure.md) is still 153 lines listing 51 hook names, no diagram. Recent edits (`a92a90a1` "Documenting 3.1 features") expanded prose around `enable_compatibility_mode()` but no figure added. Two volunteers (juvogt, tjbai) said they'd contribute years ago, no PR landed.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~1-day docs task — generate a diagram (matplotlib + manual layout, or Excalidraw + commit the SVG). Place at `docs/source/_static/hook_diagram.svg`, embed in `model_structure.md`. Now overlaps with v3.0 hook-aliasing — diagram should label both new canonical (e.g., `blocks.{i}.ln1.hook_out`) and legacy aliases (`hook_normalized`, `hook_scale`) shown in the doc at line 98-100.
+
+<a id="issue-661"></a>
+
+#### #661 — Pythia output inconsistent across batch sizes with `use_split_qkv_input=True`
+
+- **Issue**: `model(input[:2])[0]` and `model(input[:1])[0]` give different outputs when `use_split_qkv_input=True`.
+- **HookedTransformer**: 🐛 unchanged — [transformer_lens/components/transformer_block.py:123,137](transformer_lens/components/transformer_block.py#L123-L153) still branches on `use_split_qkv_input`; bug confirmed.
+- **TransformerBridge**: ⚠️ bridge now exposes `set_use_split_qkv_input` at [transformer_lens/model_bridge/bridge.py:3373](transformer_lens/model_bridge/bridge.py#L3373) — feature parity gained since v2. Whether bridge reproduces the same batch-size inconsistency is **not yet tested**; bridge implementation routes through `_propagate_attention_flag` rather than the per-token splitting in HT's `transformer_block.py`, so the root cause may differ.
+- **Replication**: `[empirically replicated]` on HT side — pythia-70m repro from issue gives `max diff: 1.14e-02` (v2 measurement).
+- **What changed since v2**: bridge gained `use_split_qkv_input` (no longer N/A as v2 stated).
+- **Bucket**: `bug-still-reproduces`
+- **Next step**: (1) reproduce on bridge with the same input — if bridge is correct, document the workaround (use bridge for split-qkv analysis). (2) HT-side investigation: stateful interaction in LN1 path, related to #335 (LN1 firing 3× per forward). Non-trivial; research-only feature so moderate priority.
+
+<a id="issue-684"></a>
+
+#### #684 — Expand quantization model support beyond Llama
+
+- **Issue**: HT raises `AssertionError: Quantization is only supported for Llama models` when loading 4bit Mistral via `hf_model=`.
+- **HookedTransformer**: 🐛 unchanged — assertion still at [transformer_lens/HookedTransformer.py:1341-1342](transformer_lens/HookedTransformer.py#L1341-L1342) (`load_in_4bit and ("llama" not in model_name.lower())`).
+- **TransformerBridge**: ✅ now functional. PR #1276 (`0a5218ca`, "Fixed Quantization bug in TransformerLens 3.0") repaired the `AttentionBridge` dtype-cast that was producing gibberish logits on quantized models — see regression test `test_AttentionBridge_preserves_fp_input_when_first_param_is_quantized` in [tests/integration/model_bridge/test_bridge_integration.py](tests/integration/model_bridge/test_bridge_integration.py). j.larson's comment on the issue points users at the migration guide. Bridge has no architecture-specific quantization assertion.
+- **Replication**: `[code-verified]`
+- **What changed since v2**: PR #1276 fixed the storage-dtype cast bug. v2 said "structurally sound but unverified empirically"; the bug was real and is now fixed with regression coverage.
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: HT-side fix would require removing the assertion AND auditing per-architecture state_dict load for BnB-packed weights (overlaps with #569 root cause). Bridge users can pre-load via `AutoModelForCausalLM.from_pretrained(model, load_in_4bit=True)` and pass to `boot_transformers(model_name, hf_model=quantized_model)`. Reply on issue with the demo link and close pending user confirmation.
+
+<a id="issue-697"></a>
+
+#### #697 — Activation cache during generate
+
+- **Issue**: User wants `run_with_cache` semantics during `model.generate()` — cache activations of generated tokens, not just the prompt.
+- **HookedTransformer**: ❌ unchanged — [transformer_lens/HookedTransformer.py:1873,2257](transformer_lens/HookedTransformer.py#L1873) `generate()` and `generate_stream()` exist but neither integrates `run_with_cache`. bryce's reply: "no integration ... pretty low priority."
+- **TransformerBridge**: ❌ unchanged — [transformer_lens/model_bridge/bridge.py:2433](transformer_lens/model_bridge/bridge.py#L2433) bridge `generate` and `generate_stream` (line 2743) — same gap. PR #1265 ("fixed batched generation on run_with_cache and run_with_hooks") improved interaction between the two surfaces but didn't add cache-during-generate.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~50 LoC enhancement — wrap the per-token forward in `run_with_cache`'s hook-installation context, accumulate cache across iterations. Trickier than naive due to KV-cache interactions; needs care to avoid duplicate hook fires when cache grows. Both APIs need the same fix.
+
+<a id="issue-704"></a>
+
+#### #704 — Add support for TracrBench
+
+- **Issue**: TracrBench (121 toy Tracr transformers) — should it ship in TransformerLens or live in a separate repo.
+- **HookedTransformer**: ❌ not in core. `grep -i tracr_bench` in `transformer_lens/` returns nothing.
+- **TransformerBridge**: ❌ not in core; not a transformer-architecture-detection problem.
+- **Replication**: `[code-verified]`
+- **What changed since v2**: nothing material — no TracrBench code added.
+- **Bucket**: `not-relevant-close`
+- **Next step**: close with Neel's recommendation: build TracrBench as an external repo using TransformerLens as a dependency. Optionally add a one-line link from `docs/source/content/gallery.md`.
+
+<a id="issue-710"></a>
+
+#### #710 — MVP Support For 1-2 Models Per-Modality
+
+- **Issue**: Add basic non-text-model support — TTS (Whisper), vision (ResNet, ViT), music gen, etc.
+- **HookedTransformer**: ❌ not designed for non-text architectures.
+- **TransformerBridge**: ⚠️ partial — `HubertArchitectureAdapter` (audio), `LlavaArchitectureAdapter` / `LlavaNextArchitectureAdapter` / `LlavaOnevisionArchitectureAdapter` / `Gemma3MultimodalArchitectureAdapter` (VLM), plus `MambaArchitectureAdapter` / `Mamba2ArchitectureAdapter` (SSM, non-attention). No Whisper, no ResNet, no ViT, no diffusion, no music. See [transformer_lens/model_bridge/supported_architectures/__init__.py:67-88](transformer_lens/model_bridge/supported_architectures/__init__.py#L67).
+- **Replication**: `[code-verified]`
+- **What changed since v2**: jlarson4 commented suggesting per-architecture sub-issues; Mamba(1/2) now confirmed in adapter list.
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: per the comment thread, encourage reporters to file per-modality sub-issues (Whisper, ViT, etc.) so each can be tracked and prioritized. Close this umbrella once those are filed, or convert to a tracking meta-issue.
+
+<a id="issue-720"></a>
+
+#### #720 — Review current matmul function usages
+
+- **Issue**: `batch_addmm` is right for GPT-2 `Conv1D`-style layers but wrong for plain `nn.Linear` models — need per-architecture matmul routing audit.
+- **HookedTransformer**: ⚠️ partial — `F.linear` cleanup landed for the post-attention output ([abstract_attention.py:368-374](transformer_lens/components/abstract_attention.py#L368)), but `batch_addmm` still in [utilities/addmm.py](transformer_lens/utilities/addmm.py); full audit not done. No new commits to either file in the past 2 weeks.
+- **TransformerBridge**: ⚠️ Q/K/V projections go through HF's `Linear` (correct), but bridge attention-score and output-application matmuls in [generalized_components/joint_qkv_attention.py](transformer_lens/model_bridge/generalized_components/joint_qkv_attention.py), [position_embeddings_attention.py](transformer_lens/model_bridge/generalized_components/position_embeddings_attention.py), [alibi_joint_qkv_attention.py](transformer_lens/model_bridge/generalized_components/alibi_joint_qkv_attention.py), and `joint_qkv_position_embeddings_attention.py` use raw `torch.matmul` — own audit need.
+- **Replication**: `[code-verified]`
+- **What changed since v2**: nothing material.
+- **Bucket**: `partial-leave-open`
+- **Next step**: same 3-part audit as before — (1) HT `batch_addmm` vs `F.linear` per-arch routing, (2) bridge `torch.matmul(q, k.T)` / `torch.matmul(weights, v)` vs HF's per-architecture impl, (3) Q/K/V projection paths.
+
+<a id="issue-729"></a>
+
+#### #729 — Guide to adding new models
+
+- **Issue**: User asks for a how-to-extend-TL guide.
+- **HookedTransformer/Bridge**: ✅ done — PR #1274 ("Adding Architecture Adapter Creation Guide to Docs", commit `fd288dc2`) landed [docs/source/content/adapter_development/adapter-creation-guide.md](docs/source/content/adapter_development/adapter-creation-guide.md), [adapter-specification.md](docs/source/content/adapter_development/adapter-specification.md), [hf-model-analysis-guide.md](docs/source/content/adapter_development/hf-model-analysis-guide.md), and a runnable [adapter-template.py](docs/source/_static/adapter-template.py).
+- **Replication**: `[code-verified]`
+- **What changed since v2**: PR #1274 (commit `fd288dc2`, ~9 days ago) closes this exactly — full bridge-side adapter creation walkthrough with template.
+- **Bucket**: `covered-close`
+- **Next step**: close with reference to PR #1274 and links to the new adapter-development docs.
+
+<a id="issue-737"></a>
+
+#### #737 — Q reshape with model loaded in 4bit
+
+- **Issue**: `cfg.use_split_qkv_input=True` + 4bit vicuna-7b → shape mismatch in `AbstractAttention.calculate_qkv_matrices` — 4bit BnB-packed weight reshapes incorrectly under split-QKV.
+- **HookedTransformer**: 🐛 still buggy — confirmed at [abstract_attention.py:58,338,378,454,473](transformer_lens/components/abstract_attention.py#L338) — multiple `if self.cfg.load_in_4bit:` branches that build `Params4bit` shaped `[nq, 1]`. No commits to this file targeting the bug since v2; the recent quantization-related commit `0a5218ca` ("Fixed Quantization bug in TransformerLens 3.0") and `d346e707` ("Improved quantization skipping") touch the bridge side, not this HT path.
+- **TransformerBridge**: N/A — bridge has no `use_split_qkv_input` flag; quantized models load via `boot_transformers(hf_model=quantized_model)` and use HF's quantized Linear directly.
+- **Replication**: `[unverifiable]` — needs GPU + bitsandbytes 4bit.
+- **What changed since v2**: nothing material on this code path.
+- **Bucket**: `partial-leave-open`
+- **Next step**: HT-side fix needs reshape-aware logic in `calculate_qkv_matrices` for 4bit + split path (~30 LoC). Bridge users avoid this entirely. Reporter workaround on HT: disable `use_split_qkv_input` for 4bit models.
+
+<a id="issue-773"></a>
+
+#### #773 — TransformerLens on models with different layernorm placement (BioGPT)
+
+- **Issue**: BioGPT has only one LN per layer (post-MLP `final_layer_norm`), unlike GPT-2's pre-LN1+pre-LN2. User asks for support.
+- **HookedTransformer**: ❌ hard-coded GPT-2 LN placement; `BioGptForCausalLM` confirmed listed under [tools/model_registry/data/architecture_gaps.json:909](transformer_lens/tools/model_registry/data/architecture_gaps.json#L909).
+- **TransformerBridge**: ❌ no `BioGptArchitectureAdapter` — grep returns no BioGpt match in [transformer_lens/model_bridge/supported_architectures/__init__.py](transformer_lens/model_bridge/supported_architectures/__init__.py). The component-map pattern theoretically supports per-arch LN layout, but no adapter exists.
+- **Replication**: `[code-verified]`
+- **What changed since v2**: nothing material; PR #1274 added the adapter creation guide which is now a viable path for the reporter to add a BioGPT adapter themselves.
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: write a `BioGptArchitectureAdapter` (~80 LoC + tests) following the new [adapter-creation-guide.md](docs/source/content/adapter_development/adapter-creation-guide.md). Could now reasonably ask the reporter to take this on with the new guide.
+
+<a id="issue-796"></a>
+
+#### #796 — `FactoredMatrix.svd()` `lru_cache` prevents GC
+
+- **Issue**: `FactoredMatrix.svd` decorated with `@lru_cache(maxsize=None)` holds instance refs and prevents garbage collection.
+- **HookedTransformer**: 🐛 still buggy — confirmed at [FactoredMatrix.py:9,217](transformer_lens/FactoredMatrix.py#L9) — `from functools import lru_cache` and `@lru_cache(maxsize=None) def svd(self): ...`. Last commit to file: `90cf7476` ("Fix FactoredMatrix eigenvalues type") — not addressing this.
+- **TransformerBridge**: 🐛 same shared `FactoredMatrix` class.
+- **Replication**: `[code-verified]`
+- **What changed since v2**: nothing material.
+- **Bucket**: `not-addressed-simple`
+- **Next step**: replace `@lru_cache(maxsize=None)` with `@cached_property` on `svd` and `eigenvalues` (~10 LoC). Breaking change (`.svd()` → `.svd`) — coordinate with broader `FactoredMatrix` cleanup.
+
+<a id="issue-798"></a>
+
+#### #798 — Remove `model_args` (use only `model_kwargs`)
+
+- **Issue**: Bryce's own proposal to remove `*model_args` + `**model_kwargs` redundancy in pass-through functions.
+- **HookedTransformer**: ⚠️ unchanged — `model_args` still present in [HookedEncoderDecoder.py:489-513](transformer_lens/HookedEncoderDecoder.py#L489), [hook_points.py:629,723,779](transformer_lens/hook_points.py#L629), [HookedAudioEncoder.py:299-323](transformer_lens/HookedAudioEncoder.py#L299), [BertNextSentencePrediction.py:220-266](transformer_lens/BertNextSentencePrediction.py#L220), [HookedTransformer.py:707+](transformer_lens/HookedTransformer.py#L707).
+- **TransformerBridge**: ⚠️ same — bridge inherits `hook_points.py` machinery.
+- **Replication**: `[code-verified]`
+- **What changed since v2**: nothing material.
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~30 LoC across affected files — strip `*model_args`, keep only `**model_kwargs`. Already labeled `breaking-change`.
+
+<a id="issue-830"></a>
+
+#### #830 — Type hint support for `self.model` in `ActivationCache`
+
+- **Issue**: `ActivationCache.model` untyped (would need `HookedTransformer` import → circular). Proposes `HookedTransformerMixin` to break the cycle.
+- **HookedTransformer**: ❌ unchanged — confirmed at [ActivationCache.py:118](transformer_lens/ActivationCache.py#L118) — `self.model = model` with no annotation.
+- **TransformerBridge**: ❌ same `ActivationCache` shared class.
+- **Replication**: `[code-verified]`
+- **What changed since v2**: nothing material.
+- **Bucket**: `not-addressed-simple`
+- **Next step**: extract a `HookedRootModuleMixin` / use `TYPE_CHECKING + Protocol` to hint without circular imports (~50 LoC). Tagged 3.0 / 4.0 milestone.
+
+<a id="issue-837"></a>
+
+#### #837 — Multi-GPU device ordinal issue (`n_devices=3` for llama2-7b)
+
+- **Issue**: `n_devices=3` produces "device ordinal out of range" — `(index // layers_per_device)` overshoots when `n_layers % n_devices != 0`.
+- **HookedTransformer**: 🐛 still buggy at [utilities/multi_gpu.py:142](transformer_lens/utilities/multi_gpu.py#L142) — `device_index = (device.index or 0) + (index // layers_per_device)` unchanged. The function is now flagged `Deprecated: This will be removed in 3.0` ([line 130-133](transformer_lens/utilities/multi_gpu.py#L130)).
+- **TransformerBridge**: ✅ first-class — PR #1270 ("Multi-Device Processing on Bridge", commit `d95bd962`) landed `resolve_device_map` at [multi_gpu.py:170-204](transformer_lens/utilities/multi_gpu.py#L170) with explicit `n_devices` / `device_map` / `max_memory` params and accelerate-backed dispatch. jlarson4's comment on the issue points users to PR #1270.
+- **Replication**: `[unverifiable]` — no multi-GPU here.
+- **What changed since v2**: PR #1270 merged (was unmerged in v2); bridge users now have a fully supported path.
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: HT-side bug remains. Reply on issue with bridge migration recipe (`bridge = TransformerBridge.boot_transformers(name, n_devices=3)`); leave HT path open for #968-family fix or close with bridge pointer if reporter migrates.
+
+<a id="issue-846"></a>
+
+#### #846 — Prioritize local `hf_model.config` for Qwen models
+
+- **Issue**: Loading a local Qwen via `from_pretrained_no_processing(model_name="Qwen/...", hf_model=local, tokenizer=tok)` still fetches HF config online and fails offline.
+- **HookedTransformer**: 🐛 same root cause as #754 / #800 — `convert_hf_model_config` calls `AutoConfig.from_pretrained` unconditionally; Qwen has no name-based shortcut.
+- **TransformerBridge**: ✅ now first-class — PR #1279 ("Updated `boot_transformers` to use local hf_config, if a local hf_model is passed", commit `0636214f`) landed at [model_bridge/sources/transformers.py:339-349](transformer_lens/model_bridge/sources/transformers.py#L339) — `if hf_model is not None: hf_config = copy.deepcopy(hf_model.config)`, skipping `AutoConfig.from_pretrained` entirely. New regression tests at [tests/integration/model_bridge/test_bridge_creation_modes.py](tests/integration/model_bridge/test_bridge_creation_modes.py).
+- **Replication**: `[code-verified]`
+- **What changed since v2**: PR #1279 closed the bridge-side gap that v2 noted as still open.
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: comment with the new bridge recipe (`boot_transformers(model_name="Qwen/...", hf_model=local_model)` — now skips network call). HT-side fix per #754 still pending; close once reporter confirms bridge resolves their use case.
+
+<a id="issue-867"></a>
+
+#### #867 — Does TransformerLens support LVLM like Qwen2-VL?
+
+- **Issue**: User asks if Qwen2-VL / Qwen2.5-VL is supported.
+- **HookedTransformer**: ❌ no native VLM support.
+- **TransformerBridge**: ❌ `Qwen2VLForConditionalGeneration` and `Qwen2_5_VLForConditionalGeneration` still listed in `architecture_gaps.json` (lines 4709, 4940). LLaVA family adapters present at [transformer_lens/utilities/architectures.py:32-34](transformer_lens/utilities/architectures.py#L32-L34) but no Qwen2-VL adapter.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: add `Qwen2VLArchitectureAdapter` (~150 LoC, LLaVA-pattern). Comment pointing reporter at LLaVA support today and ExplorerFreda's vlm-lens fork as alternative; close once Qwen2-VL adapter lands.
+
+<a id="issue-869"></a>
+
+#### #869 — Custom generative video transformer
+
+- **Issue**: User wants mech interp on a Sora-like generative video diffusion transformer.
+- **HookedTransformer**: ❌ no diffusion/video generation support.
+- **TransformerBridge**: ❌ same — bridge wraps HF causal/seq2seq/multimodal text models; not designed for diffusion.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: outside current scope per Bryce's reply (would need a separate `HookedDiffusionTransformer` root module). Recommend close as wontfix or defer to architectural roadmap; point user to a dedicated diffusion-interp tool.
+
+<a id="issue-888"></a>
+
+#### #888 — Adapt HookedTransformer to a non-supported model (CLIP language model)
+
+- **Issue**: User wants `from_pretrained` for CLIP's text encoder.
+- **HookedTransformer**: ❌ not possible without code modifications.
+- **TransformerBridge**: ⚠️ adapter framework supports it, but no `CLIPTextModel` adapter exists — grep finds no `CLIPTextModel*` symbol in `transformer_lens/`. `CLIPVisionEncoderBridge` exists for the vision side via LLaVA. jlarson4 already commented pointing the reporter at the adapter-creation guide.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: write `CLIPTextModelArchitectureAdapter` (~120 LoC, encoder-only, BERT-like attention). jlarson4's prior comment already pointed to the adapter-creation guide; could leave open as a focused model-request or invite contribution.
+
+<a id="issue-911"></a>
+
+#### #911 — PosEmbed device error with `accelerate`
+
+- **Issue**: gpt2 + `accelerate launch` (DDP across 2 GPUs) fails inside `PosEmbed.forward` because `W_pos[offset_position_ids]` mixes device.
+- **HookedTransformer**: 🐛 still buggy at [transformer_lens/components/pos_embed.py:59](transformer_lens/components/pos_embed.py#L59) (`pos_embed = self.W_pos[offset_position_ids]`). No commit on this file since `98811df5` (3.0 CI bugs).
+- **TransformerBridge**: ✅ uses HF's `wpe` directly via `EmbeddingBridge`; respects HF `device_map`. PR #1270 (`d95bd962`) merged first-class `n_devices`/`device_map` for the bridge.
+- **Replication**: `[unverifiable]` — needs DDP setup.
+- **What changed since v2**: PR #1270 multi-device bridge support is now merged on main.
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: jlarson4 already commented with bridge migration pointer — wait for reporter response. Close after confirmation, or add bridge integration test under `accelerate launch` to harden the recommendation.
+
+<a id="issue-912"></a>
+
+#### #912 — Support mT5 models
+
+- **Issue**: User requests `google/mt5-small` for multilingual circuit discovery.
+- **HookedTransformer**: ❌ T5-only path; not added.
+- **TransformerBridge**: ✅ wired end-to-end. `"mt5"` mapped at [transformer_lens/model_bridge/sources/transformers.py:235](transformer_lens/model_bridge/sources/transformers.py#L235), `MT5ForConditionalGeneration` routed to `T5ArchitectureAdapter` at [transformer_lens/factories/architecture_adapter_factory.py:119](transformer_lens/factories/architecture_adapter_factory.py#L119), plus the `requires_relative_position_bias=True` + `is_cross_attention=True` fixes in `supported_architectures/t5.py`. Verified on `google/mt5-small`, `mt5-base`, `mt5-large`, `mt5-xl` (full verification, P1=100%).
+- **Replication**: `[empirically replicated]` — verification history shows full-pass entries dated 2026-05-08.
+- **What changed since v2**: PR #1289 (`d5e3a2b0`) landed the routing + cross-attention fix; multiple mT5 sizes verified.
+- **Bucket**: `covered-close`
+- **Next step**: close with link to TransformerBridge migration guide and the `google/mt5-base` verified-models entry. Reporter's `mt5-small` use case is now directly supported.
+
+<a id="issue-950"></a>
+
+#### #950 — Support SimpleStories models
+
+- **Issue**: User requests SimpleStories support for low-resource interp work.
+- **HookedTransformer**: ❌ not registered.
+- **TransformerBridge**: ✅ 11 SimpleStories models verified end-to-end on 2026-05-08 (`SimpleStories-1.25M`, `-5M`, `-11M`, `-30M`, `-35M`, plus `V2-1.25M/5M/11M/30M/35M` and `test-SimpleStories-gpt2-1.25M`) — full verification, P1=P2=P3=100%, P4>=90% via PR #1292 (`0c0bd3ce`). `SimpleStories` author registered for `LlamaForCausalLM` at [transformer_lens/tools/model_registry/__init__.py:123](transformer_lens/tools/model_registry/__init__.py#L123).
+- **Replication**: `[empirically replicated]`
+- **What changed since v2**: PR #1292 SimpleStories Model Verification merged; jlarson4's "I'll see if I can tackle that before the next release" promise is now delivered.
+- **Bucket**: `covered-close`
+- **Next step**: close with the verified-models page link. mivanit asked for SimpleStories; 11 SimpleStories-published models now load and pass verification through the bridge.
+
+<a id="issue-953"></a>
+
+#### #953 — Add basic support for Gemma 3n (E2B & E4B)
+
+- **Issue**: Reporter asks for text-only support of Gemma 3n (AltUp / LAuReL / PLE / mixed local-global attention).
+- **HookedTransformer**: ❌ not supported.
+- **TransformerBridge**: ❌ not registered in `SUPPORTED_ARCHITECTURES`; no Gemma3n entry in [transformer_lens/utilities/architectures.py](transformer_lens/utilities/architectures.py). Bryce confirmed in-progress for next major release.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: track for milestone 3.x. AltUp/LAuReL/PLE need dedicated component bridges (~200-500 LoC); mixed local/global attention overlaps with Gemma2 work. Defer until HF's `Gemma3nForCausalLM` forward stabilizes.
+
+<a id="issue-968"></a>
+
+#### #968 — `unsloth/llama-3.2-3b-instruct` with 2× 3060 device-mismatch
+
+- **Issue**: `from_pretrained(..., n_devices=2)` on 2× 3060 throws `RuntimeError: indices should be either on cpu or on the same device`.
+- **HookedTransformer**: 🐛 multi-GPU placement bug cluster (#837/#907/#911) — `move_model_modules_to_device` greedy allocation is unchanged.
+- **TransformerBridge**: ✅ PR #1270 (`d95bd962`) merged first-class `n_devices` / `device_map` to bridge — see [transformer_lens/model_bridge/sources/transformers.py:293-294](transformer_lens/model_bridge/sources/transformers.py#L293-L294) and the `resolve_device_map` path at line 480+.
+- **Replication**: `[unverifiable]` — no multi-GPU device.
+- **What changed since v2**: PR #1270 is now merged on main, not just on a feature branch.
+- **Bucket**: `bug-likely-fixed-needs-verification`
+- **Next step**: jlarson4's prior comment offered bridge + #1270; with #1270 now merged, ask reporter to retest with `TransformerBridge.boot_transformers("unsloth/llama-3.2-3b-instruct", n_devices=2)`. Close on confirmation.
+
+<a id="issue-1080"></a>
+
+#### #1080 — Import fails by default in Colab (numpy ABI mismatch)
+
+- **Issue**: Fresh Colab + `pip install transformer_lens` + `import transformer_lens` raises `numpy.dtype size changed` ABI error; kernel restart works around it.
+- **HookedTransformer**: ⚠️ [pyproject.toml:12-13](pyproject.toml#L12-L13) still has `numpy>=1.24` / `numpy>=1.26` lower bounds with no upper cap. Numpy 2.x is allowed; transitive ABI mismatch root cause.
+- **TransformerBridge**: ⚠️ same install path; same numpy.
+- **Replication**: `[unverifiable]` — Colab-specific.
+- **Bucket**: `bug-likely-fixed-needs-verification`
+- **Next step**: ask reporter to retest with current Colab kernel + current TL (3.x). If still failing, bisect transitive deps (`pandas`, `einops`, `jaxtyping`) and pin a tested numpy. No movement on this since v2.
+
+<a id="issue-1133"></a>
+
+#### #1133 — `tokenize_and_concatenate` cuts tokens mid-document
+
+- **Issue**: Char-based 20-chunk split could cut tokens mid-doc, producing impossible token pairs.
+- **HookedTransformer**: ✅ fixed. [transformer_lens/utilities/tokenize_utils.py:76-89](transformer_lens/utilities/tokenize_utils.py#L76-L89) tokenizes per-doc with `add_special_tokens=False`, joins with explicit token-level EOS, and reshapes — no string-level chunking. PR #1273 (`ad8e123b`); further refined by PR #1287 (`3003f77a`, "Tokenize and Concatenate additional datasets").
+- **TransformerBridge**: ✅ shared utility.
+- **Replication**: `[code-verified]` — original `tokens[79848:79850] == [337, 346]` repro cannot occur.
+- **What changed since v2**: PR #1287 added more dataset coverage on top of #1273.
+- **Bucket**: `covered-close`
+- **Next step**: close as fixed (PRs #1273 + #1287). Confirm with BorisTheBrave that the new `dataset.map`-driven approach (per their suggestion in the thread) addresses the pathological cases too.
+
+<a id="issue-1148"></a>
+
+#### #1148 — Tutorial for "Real-Time Training Dynamics" (VSM Telemetry)
+
+- **Issue**: Reporter proposes a demo notebook for σ_p / σ_a training-dynamics telemetry.
+- **HookedTransformer**: ❌ no such tutorial. `demos/` has only `Grokking_Demo.ipynb`, no VSM telemetry.
+- **TransformerBridge**: ❌ same — works equivalently against bridge's hook system.
+- **Replication**: `[code-verified]`
+- **What changed since v2**: jonathanrbelanger-lang committed in-thread to "get to work on this over the coming weekend" after jlarson4's invitation.
+- **Bucket**: `not-addressed-simple`
+- **Next step**: leave open and wait for the reporter's PR (notebook in `/demos`, targeting `TransformerBridge`). If no PR materializes, invite community contribution and close as wontfix.
+
+<a id="issue-1263"></a>
+
+#### #1263 — Direct Logit Attribution Tool
+
+- **Issue**: Proposal — add a first-class direct-logit-attribution helper. Adjacent to #112 (logit display) and #111 (path patching demo).
+- **HookedTransformer**: needs-investigation (added since v2)
+- **TransformerBridge**: needs-investigation
+- **Replication**: `[needs-investigation]`
+- **Bucket**: `needs-triage`
+- **Labels (from GitHub)**: enhancement / good first issue
+- **Next step**: read issue body + comments; classify per v2 buckets; spot-check current code if claim is testable.
+
+<a id="issue-1275"></a>
+
+#### #1275 — Update Benchmarks & Verify Models to support Quantized models
+
+- **Issue**: **Largely addressed in this branch** — quantization classification refactored (HF-loadable formats admitted to registry) and verify_models gates on `required_quant_library_for_model()` with a clean skip path when libs are missing. PR #1276 already fixed the dtype bug. Worth re-verifying the issue's specific asks against current state.
+- **HookedTransformer**: needs-investigation (added since v2)
+- **TransformerBridge**: needs-investigation
+- **Replication**: `[needs-investigation]`
+- **Bucket**: `needs-triage`
+- **Labels (from GitHub)**: enhancement
+- **Next step**: read issue body + comments; classify per v2 buckets; spot-check current code if claim is testable.
+
+<a id="issue-1280"></a>
+
+#### #1280 — Add support for `cpu`, `meta`, and `disk` to TransformerBridge `device_map`
+
+- **Issue**: Proposal — extend bridge device_map handling. Pairs with #872 (broader device_map review).
+- **HookedTransformer**: needs-investigation (added since v2)
+- **TransformerBridge**: needs-investigation
+- **Replication**: `[needs-investigation]`
+- **Bucket**: `needs-triage`
+- **Labels (from GitHub)**: TransformerBridge / enhancement
+- **Next step**: read issue body + comments; classify per v2 buckets; spot-check current code if claim is testable.
+
+<a id="issue-1291"></a>
+
+#### #1291 — CI HuggingFace Call Reduction
+
+- **Issue**: CI optimization — reduce HF Hub round-trips during test runs. Probably easy via fixture-level caching of the small models that get re-downloaded across test files.
+- **HookedTransformer**: needs-investigation (added since v2)
+- **TransformerBridge**: needs-investigation
+- **Replication**: `[needs-investigation]`
+- **Bucket**: `needs-triage`
+- **Labels (from GitHub)**: low-priority
+- **Next step**: read issue body + comments; classify per v2 buckets; spot-check current code if claim is testable.
+
diff --git a/OPEN_ISSUES_TRIAGE.v4.md b/OPEN_ISSUES_TRIAGE.v4.md
new file mode 100644
index 000000000..77663a8ce
--- /dev/null
+++ b/OPEN_ISSUES_TRIAGE.v4.md
@@ -0,0 +1,509 @@
+# Open Issues Triage (v4)
+
+**Generated:** 2026-05-11 (v4 — re-verified after another sprint of closures)
+**Repo:** TransformerLensOrg/TransformerLens
+**Open issues:** 38 (36 re-verified from v3 + 2 opened since)
+**Previous archive:** [OPEN_ISSUES_TRIAGE.v3.md](OPEN_ISSUES_TRIAGE.v3.md), [OPEN_ISSUES_TRIAGE_OLD.md](OPEN_ISSUES_TRIAGE_OLD.md) (v2)
+
+## What changed since v3
+
+- **12 issues closed** during the v3 cycle: #290, #569, #661, #684, #729, #846, #911, #912, #950, #968, #1133, #1275
+- **36 entries re-verified** against current code; 1 newly-closeable, 1 verdict refined, 3 stubs properly triaged
+- **2 new entries**: #1297 (Gemma4 adapter), #1298 (external arch registration) — both `not-addressed-simple` bridge work
+
+### Newly closeable based on v4 re-verification
+
+- **#483** — bridge `generate()` no-tokenizer fix landed (commit `513d157b`, May 11) with regression test. Both HT and bridge sides now guard `self.tokenizer is not None`.
+
+### Verdict refinements (still open, evidence stronger or context corrected)
+
+- **#462** — jlarson4 commented confirming Mamba + Granite-MoE-Hybrid shipped on bridge 3.0; close-action is now well-supported.
+- **#644** — discovered an existing `TransformerLens_Diagram.svg` referenced from `index.md` but not embedded in `model_structure.md`. Bucket moved from `not-addressed-simple` to `partial-leave-open` — next step is a 1-line embed.
+- **#1263** — `cache.logit_attrs` exists in [ActivationCache.py:488-606](transformer_lens/ActivationCache.py#L488-L606) but no standalone wrapper in `tools/analysis/`. Now `not-addressed-simple` (was `needs-triage`).
+- **#1280** — exact blocker found at [multi_gpu.py:146-167](transformer_lens/utilities/multi_gpu.py#L146) (`_UNSUPPORTED_DEVICE_MAP_VALUES`); reporter-assigned with PR in flight. Now `partial-leave-open`.
+- **#1291** — CI already caches HF model dirs; missing `concurrency` group is the targetable next step. Now `partial-leave-open`.
+
+The v2 methodology section (HT-side / Bridge-side / Replication / Next step) still applies — see [OPEN_ISSUES_TRIAGE_OLD.md](OPEN_ISSUES_TRIAGE_OLD.md#methodology-per-issue).
+
+## Summary table (sorted by issue number)
+
+| Issue | Title | Bucket |
+|---|---|---|
+| #111 | [Demo of direct path patching](#issue-111) | `not-addressed-difficult` |
+| #112 | [Helper to display vectors of logits nicely](#issue-112) | `not-addressed-simple` |
+| #210 | [`get_full_resid_decomposition` accept tensor argument](#issue-210) | `not-addressed-simple` |
+| #297 | [Better print-outs for currently attached hooks](#issue-297) | `not-addressed-simple` |
+| #341 | [Update FactoredMatrix.svd() (uses deprecated `torch.svd`, returns V not Vh)](#issue-341) | `not-addressed-simple` |
+| #385 | [Pythia / Rotary Embeddings don't match HuggingFace](#issue-385) | `bug-still-reproduces` |
+| #453 | [`from_pretrained()` always downloads same weights with `checkpoint_label`](#issue-453) | `bug-likely-fixed-needs-verification` |
+| #462 | [Add support for Mamba](#issue-462) | `fixed-on-transformerbridge` |
+| #479 | [Memory efficient causal mask implementation](#issue-479) | `partial-leave-open` |
+| #481 | [Tracr to TransformerLens demo broken](#issue-481) | `bug-still-reproduces` |
+| #483 | [`HookedTransformer.generate()` `pad_token_id` error when tokenizer unset](#issue-483) | `covered-close` |
+| #509 | [LayerNorm folding not implemented for BertBlock](#issue-509) | `not-addressed-difficult` |
+| #543 | [Grokking demo broken in Colab](#issue-543) | `bug-likely-fixed-needs-verification` |
+| #588 | [Setup unit tests to cover model configurations](#issue-588) | `partial-leave-open` |
+| #595 | [Add Stopping Criteria support](#issue-595) | `not-addressed-simple` |
+| #615 | [HookedTransformer output not identical to HuggingFace for Llama 3](#issue-615) | `fixed-on-transformerbridge` |
+| #644 | [Documentation: Map the Act Names to the Transformer](#issue-644) | `partial-leave-open` |
+| #697 | [Activation cache during generate](#issue-697) | `not-addressed-simple` |
+| #704 | [Add support for TracrBench](#issue-704) | `not-relevant-close` |
+| #710 | [MVP Support For 1-2 Models Per-Modality](#issue-710) | `not-addressed-difficult` |
+| #720 | [Review current matmul function usages](#issue-720) | `partial-leave-open` |
+| #737 | [Q reshape with model loaded in 4bit](#issue-737) | `partial-leave-open` |
+| #773 | [TransformerLens on models with different layernorm placement (BioGPT)](#issue-773) | `not-addressed-difficult` |
+| #796 | [`FactoredMatrix.svd()` `lru_cache` prevents GC](#issue-796) | `not-addressed-simple` |
+| #798 | [Remove `model_args` (use only `model_kwargs`)](#issue-798) | `not-addressed-simple` |
+| #830 | [Type hint support for `self.model` in `ActivationCache`](#issue-830) | `not-addressed-simple` |
+| #837 | [Multi-GPU device ordinal issue (`n_devices=3` for llama2-7b)](#issue-837) | `fixed-on-transformerbridge` |
+| #867 | [Does TransformerLens support LVLM like Qwen2-VL?](#issue-867) | `not-addressed-difficult` |
+| #869 | [Custom generative video transformer](#issue-869) | `not-addressed-difficult` |
+| #888 | [Adapt HookedTransformer to a non-supported model (CLIP language model)](#issue-888) | `not-addressed-difficult` |
+| #953 | [Add basic support for Gemma 3n (E2B & E4B)](#issue-953) | `not-addressed-difficult` |
+| #1080 | [Import fails by default in Colab (numpy ABI mismatch)](#issue-1080) | `bug-likely-fixed-needs-verification` |
+| #1148 | [Tutorial for "Real-Time Training Dynamics" (VSM Telemetry)](#issue-1148) | `not-addressed-simple` |
+| #1263 | [Direct Logit Attribution Tool](#issue-1263) | `not-addressed-simple` |
+| #1280 | [Add support for `cpu`, `meta`, and `disk` to TransformerBridge `device_map`](#issue-1280) | `partial-leave-open` |
+| #1291 | [CI HuggingFace Call Reduction](#issue-1291) | `partial-leave-open` |
+| #1297 | [Gemma4 Architecture Adapter](#issue-1297) | `not-addressed-simple` |
+| #1298 | [External Architecture Registration](#issue-1298) | `not-addressed-simple` |
+
+## Per-issue entries
+
+<a id="issue-111"></a>
+
+#### #111 — Demo of direct path patching
+
+- **Issue**: Add a section to Exploratory Analysis Demo demonstrating direct path patching for all head pairs. PR #49 was an early attempt.
+- **HookedTransformer**: still no first-class path-patching helper. Verified — no `path_patch`/`direct_path` symbols exist anywhere under [transformer_lens/](transformer_lens/) or [transformer_lens/utilities/](transformer_lens/utilities/). [demos/Activation_Patching_in_TL_Demo.ipynb](demos/Activation_Patching_in_TL_Demo.ipynb) and [demos/Attribution_Patching_Demo.ipynb](demos/Attribution_Patching_Demo.ipynb) are the closest.
+- **TransformerBridge**: same — no path-patching primitive in either API; bridge reuses the same `ActivationCache`.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: callum mcdougall pointed users at the [ARENA IOI notebook](https://colab.research.google.com/drive/1KgrEwvCKdX-8DQ1uSiIuxwIiwzJuQ3Gw). Either close with a docs pointer to ARENA, or implement a TL helper that wraps the pattern (~80 LoC).
+
+<a id="issue-112"></a>
+
+#### #112 — Helper to display vectors of logits nicely
+
+- **Issue**: Neel asked for two things: **MVP** — function mapping logit vector → pandas DataFrame `(token_index, token_string, logit, log_prob, probability)`. **Bonus** — nostalgebraist-style `plot_logit_lens` heatmap.
+- **HookedTransformer**: `test_prompt` in [transformer_lens/utilities/exploratory_utils.py:14](transformer_lens/utilities/exploratory_utils.py#L14) prints top-k for prompt+answer — partial spirit of the MVP but print-only, single-position. [transformer_lens/utilities/logits_utils.py](transformer_lens/utilities/logits_utils.py) exists but contains no `logits_to_df` or `plot_logit_lens` helper. Unchanged since v3.
+- **TransformerBridge**: same — `test_prompt` works through bridge; no separate helper.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~30 LoC for `logits_to_df(logits, tokenizer, top_k=None) -> pd.DataFrame` (drop in [logits_utils.py](transformer_lens/utilities/logits_utils.py)), ~50 LoC for matplotlib `plot_logit_lens`. Both small library additions independent of CircuitsVis.
+
+<a id="issue-210"></a>
+
+#### #210 — `get_full_resid_decomposition` accept tensor argument
+
+- **Issue**: Add a `project_output_onto: [d_model]` or `[d_model, num_outputs]` argument so neuron-decomposition doesn't blow GPU memory by materializing `[batch, pos, d_mlp, d_model]`.
+- **HookedTransformer**: signature at [transformer_lens/ActivationCache.py:1091](transformer_lens/ActivationCache.py#L1091) — verified no `project_output_onto` kwarg added; memory-blowing path unchanged. No commits on the file in the last 5 days.
+- **TransformerBridge**: same — bridge imports the same `ActivationCache` class.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: add `project_output_onto` kwarg + `(neurons * (W_out @ project_output_onto))` path. ~15 LoC + 1 test. Alan Cooney offered to take it; never landed.
+
+<a id="issue-297"></a>
+
+#### #297 — Better print-outs for currently attached hooks
+
+- **Issue**: API for listing hooks attached to a model + HookPoint, with detail.
+- **HookedTransformer**: no first-class `model.list_hooks()` or `HookPoint.describe()` API. Verified — `list_active_hooks`, `list_hooks`, and `describe()` symbols absent from [transformer_lens/hook_points.py](transformer_lens/hook_points.py). `model.hook_dict` and `hp.fwd_hooks`/`bwd_hooks` remain the only inspection surface.
+- **TransformerBridge**: same — uses same `hook_points` machinery.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: add `HookedRootModule.list_active_hooks()` returning `Dict[str, List[hook_repr]]`. ~15 LoC + 1 test. Abandoned PR #302 was the prior attempt.
+
+<a id="issue-341"></a>
+
+#### #341 — Update FactoredMatrix.svd() (uses deprecated `torch.svd`, returns V not Vh)
+
+- **Issue**: TL uses deprecated `torch.svd` (which returns V, not Vh) inside `FactoredMatrix.svd`. Should switch to `torch.linalg.svd` and return Vh per modern convention.
+- **HookedTransformer/Bridge**: confirmed at [transformer_lens/FactoredMatrix.py:218-233](transformer_lens/FactoredMatrix.py#L218-L233) — three `torch.svd(...)` calls still present in `def svd`. No commits on file since v3.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~15-line fix — switch to `torch.linalg.svd(..., full_matrices=False)`, return `Vh` directly, update docstring noting the breaking change. `diego898` offered to send PR. Land with a deprecation warning.
+
+<a id="issue-385"></a>
+
+#### #385 — Pythia / Rotary Embeddings don't match HuggingFace
+
+- **Issue**: Logit drift between `HookedTransformer` and HF for Pythia models. Llama-2-7b-chat reportedly catastrophic. Llama-3.2 rotary mismatch persists per chengjiali.
+- **HookedTransformer**: rotary code lives at [transformer_lens/components/abstract_attention.py:599](transformer_lens/components/abstract_attention.py#L599). No new pythia-specific commits in the last 5 days.
+- **TransformerBridge**: bridge uses HF's rotary directly via `RotaryEmbeddingBridge` at [transformer_lens/model_bridge/generalized_components/rotary_embedding.py:15](transformer_lens/model_bridge/generalized_components/rotary_embedding.py#L15), and joint-QKV/position-emb attention bridges call HF's `rotary_emb(seq_len, device)` directly. By construction matches HF.
+- **Replication**: `[empirically replicated]` per v2/v3 (NaN logits in fp32 baseline). Not re-run this round.
+- **Bucket**: `bug-still-reproduces` + `fixed-on-transformerbridge` for bridge users
+- **Next step**: investigate the v2-reported NaN regression — verify whether it persists with full `from_pretrained` on current HEAD; bisect against `2c41b6c9`. Bridge users avoid this entirely.
+
+<a id="issue-453"></a>
+
+#### #453 — `from_pretrained()` always downloads same weights with `checkpoint_label`
+
+- **Issue**: Reporter passes `checkpoint_label=...` and gets identical weights regardless of label. `checkpoint_index` works.
+- **HookedTransformer**: signature at [transformer_lens/HookedTransformer.py:1158-1159](transformer_lens/HookedTransformer.py#L1158-L1159) has `checkpoint_index` and `checkpoint_value` — **NOT `checkpoint_label`**. The kwarg is silently absorbed into `**from_pretrained_kwargs`. Loader at [transformer_lens/loading_from_pretrained.py:1693-1707](transformer_lens/loading_from_pretrained.py#L1693-L1707) similarly only references `checkpoint_index`/`checkpoint_value`. Unchanged since v3.
+- **TransformerBridge**: no checkpoint feature — uses HF's native loading only.
+- **Replication**: `[code-verified]`
+- **Bucket**: `bug-likely-fixed-needs-verification` (effectively user-error)
+- **Next step**: respond to reporter that the parameter is `checkpoint_value`. Optionally validate unknown kwargs in `from_pretrained` and raise. ~10 LoC defensive change.
+
+<a id="issue-462"></a>
+
+#### #462 — Add support for Mamba
+
+- **Issue**: Add Mamba SSM architecture support.
+- **HookedTransformer**: not supported (by design — Mamba is fundamentally different from attention transformers).
+- **TransformerBridge**: `MambaArchitectureAdapter` and `Mamba2ArchitectureAdapter` registered at [transformer_lens/factories/architecture_adapter_factory.py:95-96](transformer_lens/factories/architecture_adapter_factory.py#L95-L96). Both `MambaForCausalLM` and `Mamba2ForCausalLM` HF model classes mapped. Confirmed unchanged since v3.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: jlarson4 left a comment on the issue confirming Mamba support has shipped on `TransformerBridge` 3.0 (also mentions Granite MoE hybrid).
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: close with comment pointing at `TransformerBridge.boot_transformers("state-spaces/mamba-130m-hf")`. Mamba support is shipped.
+
+<a id="issue-479"></a>
+
+#### #479 — Memory efficient causal mask implementation
+
+- **Issue**: Each `Attention` layer registers a `(n_ctx, n_ctx)` boolean `causal_mask` buffer. ~86 GB overhead at Qwen 72B × 32K ctx.
+- **HookedTransformer**: confirmed at [transformer_lens/components/abstract_attention.py:120-128](transformer_lens/components/abstract_attention.py#L120-L128) — `causal_mask = torch.tril(torch.ones((self.cfg.n_ctx, self.cfg.n_ctx)).bool())` and `register_buffer("mask", causal_mask)` still present (also at line 774 for resize). Bug as reported still present for ALL HT architectures.
+- **TransformerBridge**: architecture-dependent. GPT2-family inherits HF's static `(max_pos, max_pos)` buffer. Modern HF impls (GPTNeoX/Pythia/Llama/Qwen/Mistral/Gemma) use `_update_causal_mask` per forward — zero overhead. The motivating Qwen 72B case is fixed on bridge.
+- **Replication**: `[empirically replicated]` per v2.
+- **Bucket**: `partial-leave-open`
+- **Next step**: bridge users on modern architectures already have the desired memory profile. HT-side fix (~30 LoC: replace pre-allocated buffer with on-the-fly construction in `apply_causal_mask`) closes it for the legacy path and GPT2-family use cases.
+
+<a id="issue-481"></a>
+
+#### #481 — Tracr to TransformerLens demo broken
+
+- **Issue**: Demo notebook assumes "the unembed is a projection onto the first few elements of the residual stream" — wrong because Tracr re-orders the residual stream alphabetically. Needs Tracr upstream PR to expose the unembed matrix.
+- **HookedTransformer**: 🐛 confirmed at [demos/Tracr_to_Transformer_Lens_Demo.ipynb:233](demos/Tracr_to_Transformer_Lens_Demo.ipynb) — `sd["unembed.W_U"] = np.eye(d_model, d_vocab_out)` line still present. No commits on the notebook since v3.
+- **TransformerBridge**: ❌ N/A — Tracr-specific issue applies regardless of API; root cause is in the unembed-matrix derivation, not in TL's hook system. Demo not ported to bridge.
+- **Replication**: `[code-verified]`
+- **Bucket**: `bug-still-reproduces`
+- **Next step**: needs Tracr upstream PR to expose `unembed_matrix` in `tracr.params`. FlyingPumba previously volunteered. Without that, demo is fundamentally limited.
+
+<a id="issue-483"></a>
+
+#### #483 — `HookedTransformer.generate()` `pad_token_id` error when tokenizer unset
+
+- **Issue**: `model.generate()` with no tokenizer raises `AttributeError: 'NoneType' object has no attribute 'pad_token_id'`. Use case: training models on tokenizer-less domains (e.g., character-level integer addition).
+- **HookedTransformer**: ✅ fixed by PR #1267 (commit `b1cc8c80`); see [transformer_lens/HookedTransformer.py:2068-2089](transformer_lens/HookedTransformer.py#L2068-L2089). Regression test at [tests/unit/test_generate_no_tokenizer.py](tests/unit/test_generate_no_tokenizer.py).
+- **TransformerBridge**: ✅ now fixed by commit `513d157b` ("fix bridge side of generating with no tokenizer", May 11 2026). Both [bridge.py:2548-2571](transformer_lens/model_bridge/bridge.py#L2548-L2571) (`generate`) and [bridge.py:2829-2851](transformer_lens/model_bridge/bridge.py#L2829-L2851) (`generate_stream`) now guard `self.tokenizer is not None` and accept user-supplied `eos_token_id`. Regression test at [tests/unit/model_bridge/test_bridge_generate_no_tokenizer.py](tests/unit/model_bridge/test_bridge_generate_no_tokenizer.py).
+- **Replication**: `[code-verified]`
+- **What changed since v3**: bridge-side fix landed (`513d157b`). v3's `partial-leave-open` bucket is now resolved on both sides.
+- **Bucket**: `covered-close`
+- **Next step**: close with reference to PR #1267 (HT) and commit `513d157b` (bridge); both regression tests in place.
+
+<a id="issue-509"></a>
+
+#### #509 — LayerNorm folding not implemented for BertBlock
+
+- **Issue**: BertBlock uses post-norm; `fold_ln=True` would fold LN into Q/K/V which is mathematically incorrect for post-norm.
+- **HookedTransformer**: 🐛 architectural limitation per Neel ("LayerNorm should not be folded at all... I can't think of any way to do LayerNorm folding for Bert"). [`HookedEncoder.from_pretrained`](transformer_lens/HookedEncoder.py#L412) hardcodes `fold_ln=False`. `BertBlock` at [transformer_lens/components/bert_block.py:19](transformer_lens/components/bert_block.py#L19). No changes since v3.
+- **TransformerBridge**: ⚠️ `BertArchitectureAdapter` exists; `enable_compatibility_mode()` would inherit the same fold-doesn't-work problem. Bridge users typically don't fold LN regardless.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: Two options unchanged from v3 — close as wontfix (Neel's view) or add a 5-line warning when `fold_ln=True` is passed for a BERT-family architecture.
+
+<a id="issue-543"></a>
+
+#### #543 — Grokking demo broken in Colab
+
+- **Issue**: `loss_fn(all_logits, labels)` raises `RuntimeError: Size does not match at dimension 0 expected index [12769, 1] to be smaller than self [113, 113]`.
+- **HookedTransformer**: ⚠️ unverified. `demos/Grokking_Demo.ipynb` last touched in `98811df5 3.0 CI Bugs (#1261)`; no commits referencing #543. No new activity since v3.
+- **TransformerBridge**: N/A — demo-specific shape bug.
+- **Replication**: `[unverifiable]` — needs Colab-like environment to run the full notebook end-to-end.
+- **Bucket**: `bug-likely-fixed-needs-verification`
+- **Next step**: ask reporter (or anthonyduong9) to re-run the notebook on current `dev` and confirm whether the original error reproduces. If yes, fix is in `loss_fn` shape mismatch; if no, close.
+
+<a id="issue-588"></a>
+
+#### #588 — Setup unit tests to cover model configurations
+
+- **Issue**: Add unit tests that load every supported model's config and verify it's parseable.
+- **HookedTransformer/Bridge**: ⚠️ partial — same as v3. Per-architecture coverage at `tests/unit/test_gemma3_config.py`, `test_hooked_transformer_config.py`, `test_llava_config.py`, plus 7 adapter test files under [tests/unit/model_bridge/supported_architectures/](tests/unit/model_bridge/supported_architectures/) (baichuan, codegen, cohere, gpt_bigcode, internlm2, mpt, xglm). No single parametrized sweep over the full `SUPPORTED_ARCHITECTURES` keyset. Verification system improvements landed (PR #1293) but are runtime/empirical, not config-only.
+- **Replication**: `[code-verified]`
+- **Bucket**: `partial-leave-open`
+- **Next step**: ~30 LoC parametrized test over all `SUPPORTED_ARCHITECTURES` keys: load config-only (no weights) and assert the architecture adapter resolves.
+
+<a id="issue-595"></a>
+
+#### #595 — Add Stopping Criteria support
+
+- **Issue**: HF offers `StoppingCriteria` for custom halt conditions; HT/bridge `generate()` only support `stop_at_eos`.
+- **HookedTransformer**: ❌ unchanged — [transformer_lens/HookedTransformer.py:1882](transformer_lens/HookedTransformer.py#L1882) `generate()` and `generate_stream()` (line 2262) still only take `stop_at_eos: bool`.
+- **TransformerBridge**: ❌ unchanged — [transformer_lens/model_bridge/bridge.py:2438](transformer_lens/model_bridge/bridge.py#L2438) `generate()` and `generate_stream()` (line 2754) only have `stop_at_eos`.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~30 LoC — add `stopping_criteria: Optional[Callable[[tokens, logits], bool]] = None` to all four entry points; evaluate after each sampled token and break if any returns True. srishti-git1110 volunteered in 2024.
+
+<a id="issue-615"></a>
+
+#### #615 — HookedTransformer output not identical to HuggingFace for Llama 3
+
+- **Issue**: Greedy decoding diverges between HT and HF on Llama-3-8B-Instruct. Investigation localized to MLP weight differences after einsum/Linear conversion.
+- **HookedTransformer**: ⚠️ much improved — most einsums in attention/MLP replaced with `F.linear`. degenfabian reports max diff ~`2e-4` on Llama-3-8B-Instruct; close enough for production but not bit-exact.
+- **TransformerBridge**: ✅ argmax/CE/generation parity with HF achieved. PR #1276 fixed an `AttentionBridge`/`GeneralizedComponent` dtype-cast bug that was silently degrading attention precision. Bridge does its own attention math at [generalized_components/joint_qkv_attention.py:465-480](transformer_lens/model_bridge/generalized_components/joint_qkv_attention.py#L465-L480).
+- **Replication**: `[empirically replicated]` — bridge gives small drift but argmax-matches HF on Pythia-70m (per the v3 measurement).
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: bridge users get argmax/CE/generation parity. Bit-exact match still depends on `attn_implementation="eager"` vs HF default sdpa, softmax dtype/order, and `.contiguous()` calls. Document the known eager-vs-sdpa caveat in [docs/source/content/migrating_to_v3.md](docs/source/content/migrating_to_v3.md).
+
+<a id="issue-644"></a>
+
+#### #644 — Documentation: Map the Act Names to the Transformer
+
+- **Issue**: Add a labeled diagram mapping hook names to positions on a transformer architecture figure.
+- **HookedTransformer/Bridge**: ⚠️ partial — a community diagram by akozlo exists at [docs/source/_static/TransformerLens_Diagram.svg](docs/source/_static/TransformerLens_Diagram.svg) and is linked from [docs/source/index.md:21](docs/source/index.md#L21), but **not** embedded in [docs/source/content/model_structure.md](docs/source/content/model_structure.md) (153 lines, 51 hook names listed without a figure). No changes to the doc since `a92a90a1` "Documenting 3.1 features".
+- **Replication**: `[code-verified]`
+- **Bucket**: `partial-leave-open`
+- **Next step**: embed the existing `TransformerLens_Diagram.svg` in `model_structure.md` near the hook list, and add a v3.0 hook-aliasing legend (`hook_normalized` → `ln1.hook_out`, etc.). Or commission a fresh, hook-name-labeled diagram if the existing one omits names.
+
+<a id="issue-697"></a>
+
+#### #697 — Activation cache during generate
+
+- **Issue**: User wants `run_with_cache` semantics during `model.generate()` — cache activations of generated tokens, not just the prompt.
+- **HookedTransformer**: ❌ unchanged — [transformer_lens/HookedTransformer.py:1873](transformer_lens/HookedTransformer.py#L1873) `generate()` and `generate_stream()` (line 2257) still don't integrate `run_with_cache`. bryce's reply: "no integration ... pretty low priority."
+- **TransformerBridge**: ❌ unchanged — [transformer_lens/model_bridge/bridge.py:2434](transformer_lens/model_bridge/bridge.py#L2434) bridge `generate` and `generate_stream` (line 2749) — same gap. PR #1265 improved `run_with_cache`/`run_with_hooks` interaction but didn't add cache-during-generate.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~50 LoC enhancement — wrap the per-token forward in `run_with_cache`'s hook-installation context, accumulate cache across iterations. Trickier than naive due to KV-cache interactions; needs care to avoid duplicate hook fires when cache grows. Both APIs need the same fix.
+
+<a id="issue-704"></a>
+
+#### #704 — Add support for TracrBench
+
+- **Issue**: TracrBench (121 toy Tracr transformers) — should it ship in TransformerLens or live in a separate repo.
+- **HookedTransformer**: ❌ not in core. `grep -i tracr` in `transformer_lens/` returns nothing; only the Tracr→HookedTransformer demo lives in [docs/source/content/tutorials.md:39](docs/source/content/tutorials.md#L39).
+- **TransformerBridge**: ❌ not in core; not a transformer-architecture-detection problem.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: nothing material — no TracrBench code added, no new comments.
+- **Bucket**: `not-relevant-close`
+- **Next step**: close with Neel's recommendation: build TracrBench as an external repo using TransformerLens as a dependency. Optionally add a one-line link from `docs/source/content/gallery.md` (currently absent).
+
+<a id="issue-710"></a>
+
+#### #710 — MVP Support For 1-2 Models Per-Modality
+
+- **Issue**: Add basic non-text-model support — TTS (Whisper), vision (ResNet, ViT), music gen, etc.
+- **HookedTransformer**: ❌ not designed for non-text architectures.
+- **TransformerBridge**: ⚠️ partial — 56 adapters total at [transformer_lens/model_bridge/supported_architectures/](transformer_lens/model_bridge/supported_architectures); audio (`hubert.py`), VLM (`llava.py`, `llava_next.py`, `llava_onevision.py`, `gemma3_multimodal.py`), SSM (`mamba.py`, `mamba2.py`). Still no Whisper, no ViT, no ResNet, no diffusion.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: nothing material — same adapter set; multimodal text-gen fix landed (`58330ad0`) but no new modality.
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: per the existing comment thread, encourage reporters to file per-modality sub-issues (Whisper, ViT, etc.). Convert this to a tracking meta-issue or close once sub-issues filed.
+
+<a id="issue-720"></a>
+
+#### #720 — Review current matmul function usages
+
+- **Issue**: `batch_addmm` is right for GPT-2 `Conv1D`-style layers but wrong for plain `nn.Linear` models — need per-architecture matmul routing audit.
+- **HookedTransformer**: ⚠️ partial — `F.linear` cleanup landed for the post-attention output ([abstract_attention.py:368-374](transformer_lens/components/abstract_attention.py#L368)); `batch_addmm` still in [utilities/addmm.py](transformer_lens/utilities/addmm.py). No new commits to either file since v3.
+- **TransformerBridge**: ⚠️ same picture — bridge attention components still use raw `torch.matmul` for QK / AV: [joint_qkv_attention.py:465,480](transformer_lens/model_bridge/generalized_components/joint_qkv_attention.py#L465), [position_embeddings_attention.py:416,452](transformer_lens/model_bridge/generalized_components/position_embeddings_attention.py#L416), [alibi_joint_qkv_attention.py:98,130](transformer_lens/model_bridge/generalized_components/alibi_joint_qkv_attention.py#L98), [mla_attention.py:216,227](transformer_lens/model_bridge/generalized_components/mla_attention.py#L216), [codegen_attention.py:336,357](transformer_lens/model_bridge/generalized_components/codegen_attention.py#L336).
+- **Replication**: `[code-verified]`
+- **What changed since v3**: nothing material.
+- **Bucket**: `partial-leave-open`
+- **Next step**: same 3-part audit as before — (1) HT `batch_addmm` vs `F.linear` per-arch routing, (2) bridge `torch.matmul(q, k.T)` / `torch.matmul(weights, v)` vs HF's per-architecture impl, (3) Q/K/V projection paths.
+
+<a id="issue-737"></a>
+
+#### #737 — Q reshape with model loaded in 4bit
+
+- **Issue**: `cfg.use_split_qkv_input=True` + 4bit vicuna-7b → shape mismatch in `AbstractAttention.calculate_qkv_matrices` — 4bit BnB-packed weight reshapes incorrectly under split-QKV.
+- **HookedTransformer**: 🐛 still buggy — `if self.cfg.load_in_4bit:` branches confirmed at [abstract_attention.py:58,338,378,454,473,491](transformer_lens/components/abstract_attention.py#L338). No commits to abstract_attention.py since v3 targeting this path.
+- **TransformerBridge**: N/A — bridge has no `use_split_qkv_input` flag; quantized models load via `boot_transformers(hf_model=quantized_model)` and use HF's quantized Linear directly. Recent quantization work (`d346e707` "Improved quantization skipping") is bridge-side, doesn't touch this HT branch.
+- **Replication**: `[unverifiable]` — needs GPU + bitsandbytes 4bit.
+- **What changed since v3**: nothing material on this code path.
+- **Bucket**: `partial-leave-open`
+- **Next step**: HT-side fix needs reshape-aware logic in `calculate_qkv_matrices` for 4bit + split path (~30 LoC). Bridge users avoid this entirely. Reporter workaround on HT: disable `use_split_qkv_input` for 4bit models.
+
+<a id="issue-773"></a>
+
+#### #773 — TransformerLens on models with different layernorm placement (BioGPT)
+
+- **Issue**: BioGPT has only one LN per layer (post-MLP `final_layer_norm`), unlike GPT-2's pre-LN1+pre-LN2. User asks for support.
+- **HookedTransformer**: ❌ hard-coded GPT-2 LN placement; `BioGptForCausalLM` listed at [tools/model_registry/data/architecture_gaps.json:909](transformer_lens/tools/model_registry/data/architecture_gaps.json#L909).
+- **TransformerBridge**: ❌ no `BioGptArchitectureAdapter` — not in [transformer_lens/model_bridge/supported_architectures/](transformer_lens/model_bridge/supported_architectures) (56 adapters, none for BioGPT). The component-map pattern theoretically supports per-arch LN layout, but no adapter exists.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: nothing material; adapter creation guide at [docs/source/content/adapter_development/adapter-creation-guide.md](docs/source/content/adapter_development/adapter-creation-guide.md) is now a viable path for the reporter.
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: write a `BioGptArchitectureAdapter` (~80 LoC + tests) following the adapter-creation-guide. Reasonable to invite reporter to take this on with the guide.
+
+<a id="issue-796"></a>
+
+#### #796 — `FactoredMatrix.svd()` `lru_cache` prevents GC
+
+- **Issue**: `FactoredMatrix.svd` decorated with `@lru_cache(maxsize=None)` holds instance refs and prevents garbage collection.
+- **HookedTransformer**: 🐛 still buggy — `from functools import lru_cache` at [FactoredMatrix.py:9](transformer_lens/FactoredMatrix.py#L9) and `@lru_cache(maxsize=None)` at [FactoredMatrix.py:217](transformer_lens/FactoredMatrix.py#L217). No commits to file since v3.
+- **TransformerBridge**: 🐛 same shared `FactoredMatrix` class.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: nothing material.
+- **Bucket**: `not-addressed-simple`
+- **Next step**: replace `@lru_cache(maxsize=None)` with `@cached_property` on `svd` and `eigenvalues` (~10 LoC). Breaking change (`.svd()` → `.svd`) — coordinate with broader `FactoredMatrix` cleanup.
+
+<a id="issue-798"></a>
+
+#### #798 — Remove `model_args` (use only `model_kwargs`)
+
+- **Issue**: Bryce's own proposal to remove `*model_args` + `**model_kwargs` redundancy in pass-through functions.
+- **HookedTransformer**: ⚠️ unchanged — `model_args` still present in [HookedEncoderDecoder.py:489-513](transformer_lens/HookedEncoderDecoder.py#L489), [hook_points.py:629,723,779](transformer_lens/hook_points.py#L629), [HookedAudioEncoder.py:299-323](transformer_lens/HookedAudioEncoder.py#L299), [BertNextSentencePrediction.py:220-266](transformer_lens/BertNextSentencePrediction.py#L220), [HookedTransformer.py:707-735](transformer_lens/HookedTransformer.py#L707).
+- **TransformerBridge**: ⚠️ same — bridge inherits `hook_points.py` machinery.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: nothing material; no new comments.
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~30 LoC across affected files — strip `*model_args`, keep only `**model_kwargs`. Already labeled `breaking-change`.
+
+<a id="issue-830"></a>
+
+#### #830 — Type hint support for `self.model` in `ActivationCache`
+
+- **Issue**: `ActivationCache.model` untyped (would need `HookedTransformer` import → circular). Proposes `HookedTransformerMixin` to break the cycle.
+- **HookedTransformer**: ❌ unchanged — confirmed at [ActivationCache.py:118](transformer_lens/ActivationCache.py#L118) — `self.model = model` with no annotation.
+- **TransformerBridge**: ❌ same `ActivationCache` shared class.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: nothing material.
+- **Bucket**: `not-addressed-simple`
+- **Next step**: extract a `HookedRootModuleMixin` / use `TYPE_CHECKING + Protocol` to hint without circular imports (~50 LoC). Tagged 3.0 / 4.0 milestone.
+
+<a id="issue-837"></a>
+
+#### #837 — Multi-GPU device ordinal issue (`n_devices=3` for llama2-7b)
+
+- **Issue**: `n_devices=3` produces "device ordinal out of range" — `(index // layers_per_device)` overshoots when `n_layers % n_devices != 0`.
+- **HookedTransformer**: 🐛 still buggy at [utilities/multi_gpu.py:142](transformer_lens/utilities/multi_gpu.py#L142) — `device_index = (device.index or 0) + (index // layers_per_device)` unchanged. The function is flagged `Deprecated: This will be removed in 3.0` ([line 130-133](transformer_lens/utilities/multi_gpu.py#L130)).
+- **TransformerBridge**: ✅ first-class — `resolve_device_map` at [multi_gpu.py:170](transformer_lens/utilities/multi_gpu.py#L170) with explicit `n_devices` / `device_map` / `max_memory` and accelerate-backed dispatch. jlarson4's comment on the issue points users to PR #1270.
+- **Replication**: `[unverifiable]` — no multi-GPU here.
+- **What changed since v3**: nothing material; bridge path remains the supported route.
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: HT-side bug remains. Reply on issue with bridge migration recipe (`bridge = TransformerBridge.boot_transformers(name, n_devices=3)`); leave HT path open for #968-family fix or close with bridge pointer if reporter migrates.
+
+<a id="issue-867"></a>
+
+#### #867 — Does TransformerLens support LVLM like Qwen2-VL?
+
+- **Issue**: User asks if Qwen2-VL / Qwen2.5-VL is supported.
+- **HookedTransformer**: ❌ no native VLM support.
+- **TransformerBridge**: ❌ `Qwen2VLForConditionalGeneration` and `Qwen2_5_VLForConditionalGeneration` still listed in [transformer_lens/tools/model_registry/data/architecture_gaps.json:4709,4940](transformer_lens/tools/model_registry/data/architecture_gaps.json#L4709). Multimodal set at [transformer_lens/utilities/architectures.py:31-36](transformer_lens/utilities/architectures.py#L31-L36) covers only Llava family + Gemma3.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: no movement on Qwen-VL adapters; no new comments since v3 either.
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: add `Qwen2VLArchitectureAdapter` (LLaVA-pattern). Continue pointing reporter at LLaVA adapters today and ExplorerFreda's vlm-lens fork.
+
+<a id="issue-869"></a>
+
+#### #869 — Custom generative video transformer
+
+- **Issue**: User wants mech interp on a Sora-like generative video diffusion transformer.
+- **HookedTransformer**: ❌ no diffusion / video generation support.
+- **TransformerBridge**: ❌ bridge wraps HF causal/seq2seq/multimodal text models via `original_model`; not designed for diffusion. No new diffusion entry in [transformer_lens/utilities/architectures.py](transformer_lens/utilities/architectures.py).
+- **Replication**: `[code-verified]`
+- **What changed since v3**: no activity on issue or relevant code.
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: outside current scope per Bryce's reply (would need a separate `HookedDiffusionTransformer` root module). Recommend close as wontfix or defer to architectural roadmap; point reporter to a dedicated diffusion-interp tool.
+
+<a id="issue-888"></a>
+
+#### #888 — Adapt HookedTransformer to a non-supported model (CLIP language model)
+
+- **Issue**: User wants `from_pretrained` for CLIP's text encoder.
+- **HookedTransformer**: ❌ not possible without code modifications.
+- **TransformerBridge**: ⚠️ adapter framework supports it but no `CLIPTextModel` adapter exists — no `CLIPText*` symbol anywhere under `transformer_lens/`. `CLIPVisionEncoderBridge` exists for the vision side via LLaVA. jlarson4's earlier comment already pointed reporter at the adapter-creation guide.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: no new comments; no CLIP text adapter landed.
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: write `CLIPTextModelArchitectureAdapter` (~120 LoC, encoder-only, BERT-like attention). Leave open as a focused model-request inviting community contribution.
+
+<a id="issue-953"></a>
+
+#### #953 — Add basic support for Gemma 3n (E2B & E4B)
+
+- **Issue**: Reporter asks for text-only support of Gemma 3n (AltUp / LAuReL / PLE / mixed local-global attention).
+- **HookedTransformer**: ❌ not supported.
+- **TransformerBridge**: ❌ no Gemma3n entry in [transformer_lens/utilities/architectures.py](transformer_lens/utilities/architectures.py); no Gemma3n symbol anywhere under `transformer_lens/`. Bryce confirmed in-progress for next major release.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: no movement on Gemma3n adapter.
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: track for milestone 3.x. AltUp/LAuReL/PLE need dedicated component bridges; mixed local/global attention can share Gemma2 work. Defer until HF's `Gemma3nForCausalLM` forward stabilizes.
+
+<a id="issue-1080"></a>
+
+#### #1080 — Import fails by default in Colab (numpy ABI mismatch)
+
+- **Issue**: Fresh Colab + `pip install transformer_lens` + `import transformer_lens` raises `numpy.dtype size changed` ABI error; kernel restart works around it.
+- **HookedTransformer**: ⚠️ [pyproject.toml:11-12](pyproject.toml#L11-L12) still has `numpy>=1.24` / `numpy>=1.26` lower bounds with no upper cap. Numpy 2.x is allowed; transitive ABI mismatch root cause unchanged.
+- **TransformerBridge**: ⚠️ same install path; same numpy.
+- **Replication**: `[unverifiable]` — Colab-specific.
+- **What changed since v3**: no movement on numpy pinning; no new comments.
+- **Bucket**: `bug-likely-fixed-needs-verification`
+- **Next step**: ask reporter to retest with current Colab kernel + current TL (3.x). If still failing, bisect transitive deps and pin a tested numpy.
+
+<a id="issue-1148"></a>
+
+#### #1148 — Tutorial for "Real-Time Training Dynamics" (VSM Telemetry)
+
+- **Issue**: Reporter proposes a demo notebook for σ_p / σ_a training-dynamics telemetry.
+- **HookedTransformer**: ❌ no VSM/sigma_p/sigma_a tutorial in [demos/](demos/) — no VSM symbol anywhere under `demos/` or `transformer_lens/`.
+- **TransformerBridge**: ❌ same — works equivalently against bridge's hook system.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: jonathanrbelanger-lang committed in-thread to "get to work on this over the coming weekend" but no PR yet; no new commits to `demos/` referencing VSM telemetry.
+- **Bucket**: `not-addressed-simple`
+- **Next step**: leave open and wait for the reporter's PR (notebook in `/demos`, targeting `TransformerBridge`). If no PR materializes within a release cycle, invite community contribution and close as wontfix.
+
+<a id="issue-1263"></a>
+
+#### #1263 — Direct Logit Attribution Tool
+
+- **Issue**: Add a first-class DLA helper in `transformer_lens/tools/analysis/direct_logit_attribution.py` for the new `TransformerBridge` system. Continuation of stale PR #466 (closed 2026-04-22).
+- **HookedTransformer**: ⚠️ partial — `ActivationCache.logit_attrs` exists at [transformer_lens/ActivationCache.py:488-606](transformer_lens/ActivationCache.py#L488-L606) but no standalone tool that wraps the full DLA flow (residual decomposition → scaled attribution → display).
+- **TransformerBridge**: ⚠️ uses the same `ActivationCache.logit_attrs`, but no dedicated bridge-friendly tool. `transformer_lens/tools/` has only `model_registry/`; no `analysis/` subpackage exists yet.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: PR #466 was closed (2026-04-22) the same day issue #1263 was opened — explicitly creating the issue as a replacement scope. No PR yet.
+- **Bucket**: `not-addressed-simple`
+- **Labels**: enhancement / good first issue / help wanted / minor / complexity-moderate
+- **Next step**: create `transformer_lens/tools/analysis/direct_logit_attribution.py` wrapping `cache.logit_attrs` + residual-stack decomposition into a one-call API; ship with a demo notebook. Already labelled `good first issue` — invite contributor.
+
+<a id="issue-1280"></a>
+
+#### #1280 — Add support for `cpu`, `meta`, and `disk` to TransformerBridge `device_map`
+
+- **Issue**: Extend bridge `device_map` to allow `cpu` / `meta` / `disk` values. Currently rejected. Pairs with #872 (broader review) and #1270 (initial multi-device).
+- **HookedTransformer**: N/A — separate device-placement model.
+- **TransformerBridge**: 🐛 still rejected by design at [transformer_lens/utilities/multi_gpu.py:146-167](transformer_lens/utilities/multi_gpu.py#L146-L167): `_UNSUPPORTED_DEVICE_MAP_VALUES = {"cpu", "disk", "meta"}` validated in `_validate_device_map_values`, also blocked post-load at [transformer_lens/model_bridge/sources/transformers.py:559-566](transformer_lens/model_bridge/sources/transformers.py#L559-L566). Reporter's identified blocker is the dtype-cast loop.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: snakefood3232 volunteered with a 3-day PR estimate (skip meta-device params, use accelerate's `align_module_device`); jlarson4 assigned them on 2026-05-05.
+- **Bucket**: `partial-leave-open`
+- **Next step**: wait for snakefood3232's PR — concrete fix plan documented. Reviewer: relax `_UNSUPPORTED_DEVICE_MAP_VALUES`, gate the dtype-cast loop on `param.device.type != "meta"`, and exercise via integration test that loads a small model with `device_map={"": "cpu"}`.
+
+<a id="issue-1291"></a>
+
+#### #1291 — CI HuggingFace Call Reduction
+
+- **Issue**: CI optimization — reduce HF Hub round-trips during test runs to avoid 429 rate-limit failures across concurrent CI runs.
+- **HookedTransformer**: ⚠️ partial — [.github/workflows/checks.yml:65-88,246-269](.github/workflows/checks.yml#L65-L88) caches ~14 model dirs across `compatibility-checks` and `coverage-test`, but no `concurrency` group is configured anywhere in the workflow; many tests still call `from_pretrained` per-test rather than via session fixtures.
+- **TransformerBridge**: ⚠️ same — bridge tests under `tests/integration/model_bridge/` and `tests/acceptance/model_bridge/` share the cache but each conftest re-loads HF models.
+- **Replication**: `[code-verified]`
+- **What changed since v3**: ak91456 volunteered on 2026-05-09; no PR yet. Cache key bumped to `huggingface-models-v4` recently but core fixture/concurrency work hasn't started.
+- **Bucket**: `partial-leave-open`
+- **Labels**: enhancement / good first issue / low-priority / complexity-moderate
+- **Next step**: wait for ak91456's PR. Suggested approach: (a) add `concurrency: { group: ${{ github.workflow }}-${{ github.ref }}, cancel-in-progress: true }` to `checks.yml` to dedupe stacked runs; (b) promote per-file `from_pretrained("gpt2")` calls in conftests to session-scoped fixtures.
+
+<a id="issue-1297"></a>
+
+#### #1297 — Gemma4 Architecture Adapter
+
+- **Issue**: Add a `Gemma4ArchitectureAdapter` for the new Gemma4 family. Currently surfaces in `architecture_gaps.json` with relevancy 88.0 (109 models on HF, 121k cumulative downloads).
+- **HookedTransformer**: N/A — bridge-only path going forward; no HT weight conversion expected.
+- **TransformerBridge**: ❌ no `Gemma4ArchitectureAdapter` in [transformer_lens/model_bridge/supported_architectures/](transformer_lens/model_bridge/supported_architectures/); `Gemma4ForConditionalGeneration` not registered in [factories/architecture_adapter_factory.py](transformer_lens/factories/architecture_adapter_factory.py) or `HF_SUPPORTED_ARCHITECTURES`.
+- **Replication**: `[code-verified]` — confirmed adapter and registration entries are absent.
+- **Bucket**: `not-addressed-simple`
+- **Next step**: copy `gemma3.py` adapter as starting template (Gemma4 is most likely a Gemma3 superset); register in factory + `HF_SUPPORTED_ARCHITECTURES` + `CANONICAL_AUTHORS_BY_ARCH` (`google`); follow `docs/source/content/adapter_development/adapter-creation-guide.md`. Then verify on the canonical Google models.
+
+<a id="issue-1298"></a>
+
+#### #1298 — External Architecture Registration
+
+- **Issue**: Let users register custom architecture adapters at runtime without modifying TransformerLens source. Currently `SUPPORTED_ARCHITECTURES` in `architecture_adapter_factory.py` is hardcoded.
+- **HookedTransformer**: N/A — bridge-only concept (HT loads via `OFFICIAL_MODEL_NAMES`, no plugin hook).
+- **TransformerBridge**: ❌ no public registration API. The `SUPPORTED_ARCHITECTURES` dict at [factories/architecture_adapter_factory.py:65](transformer_lens/factories/architecture_adapter_factory.py#L65) is module-level and not user-mutable through any documented mechanism.
+- **Replication**: `[code-verified]` — no `register_adapter` function or plugin entry-point hook.
+- **Bucket**: `not-addressed-simple`
+- **Next step**: design needed first — entry-point-based discovery vs. explicit `register_adapter(arch_name, adapter_class)` function. Adapter-creation-guide already exists, so the second-half (publishing your adapter) is the remaining gap.
+
diff --git a/OPEN_ISSUES_TRIAGE_OLD.md b/OPEN_ISSUES_TRIAGE_OLD.md
new file mode 100644
index 000000000..082dc7a45
--- /dev/null
+++ b/OPEN_ISSUES_TRIAGE_OLD.md
@@ -0,0 +1,1148 @@
+# Open Issues Triage (v2)
+
+**Generated:** 2026-04-29 (v2 — code-level verification pass)
+**Repo:** TransformerLensOrg/TransformerLens
+**Branch reference:** `dev`
+**Open issues at v2 start:** 83 (down from 91 at v1; 9 closed during v1 cycle)
+**v1 archived at:** [OPEN_ISSUES_TRIAGE.v1.md](OPEN_ISSUES_TRIAGE.v1.md)
+
+## Why v2
+
+The v1 triage was based on issue text alone — bodies + comments — without verifying claims against current source. Multiple corrections during the post-v1 review (#671, #846, #867, #929, #657, #219, #264) revealed the same pattern: the issue's framing didn't match what the code actually does today, in either direction (false positives where bugs were already fixed, false negatives where the bug was still real but my reason was wrong).
+
+v2 corrects this by treating each issue as an investigation, not a reading-comprehension exercise.
+
+## Methodology per issue
+
+Every entry includes:
+
+1. **HookedTransformer side** — does the buggy/missing code path still exist? `grep`/`Read` for the actual file, function, line referenced (or implied by the issue). `git log --all -S '<key string>' -- <file>` for any commits that touched it. `git log --all --grep "Fixes #N\|Closes #N\|#N\b"` for landed fixes.
+2. **TransformerBridge side** — does the bug apply to the bridge's code path? The bridge wraps HF directly via `original_model`, has its own loading path (`sources/transformers.py`), uses HF's attention/PosEmbed/RMSNorm components, and has its own hook system. Many HT-specific bugs don't apply.
+3. **Replication evidence** — one of:
+   - `[empirically replicated]` — ran a minimal repro on this machine; bug observed
+   - `[empirically not reproduced]` — ran repro; bug does not occur
+   - `[code-verified]` — read the source; the buggy code path either exists or has been fixed/removed
+   - `[unverifiable on this machine]` — needs hardware/environment we don't have (multi-GPU, large models, MPS, Colab, 4bit, etc.)
+4. **Next step** — concrete action: close with reference, fix with file:line, migrate to bridge with recipe, ask reporter for repro, or defer with specific blocker.
+
+## Summary by bucket
+
+_Filled in incrementally as batches complete. Counts are over the 83 currently-open issues._
+
+### Batch 1 only (20 issues)
+
+| Bucket | Count |
+|---|---|
+| covered-close | 1 |
+| partial-leave-open | 5 |
+| not-addressed-simple | 5 |
+| not-addressed-difficult | 2 |
+| not-relevant-close | 0 |
+| bug-still-reproduces | 1 |
+| fixed-on-transformerbridge | 5 |
+| bug-likely-fixed-needs-verification | 1 |
+| **Total batch 1** | **20** |
+
+### Batch 2 only (20 issues)
+
+| Bucket | Count |
+|---|---|
+| covered-close | 0 |
+| partial-leave-open | 3 |
+| not-addressed-simple | 3 |
+| not-addressed-difficult | 2 |
+| not-relevant-close | 1 |
+| bug-still-reproduces | 4 |
+| fixed-on-transformerbridge | 3 |
+| bug-likely-fixed-needs-verification | 1 |
+| question-not-actionable | 3 |
+| **Total batch 2** | **20** |
+
+### Batch 3 only (20 issues)
+
+| Bucket | Count |
+|---|---|
+| covered-close | 0 |
+| partial-leave-open | 2 |
+| not-addressed-simple | 4 |
+| not-addressed-difficult | 4 |
+| not-relevant-close | 0 |
+| bug-still-reproduces | 0 |
+| fixed-on-transformerbridge | 7 |
+| bug-likely-fixed-needs-verification | 2 |
+| question-not-actionable | 1 |
+| **Total batch 3** | **20** |
+
+### Batch 4 only (20 issues)
+
+| Bucket | Count |
+|---|---|
+| covered-close | 2 |
+| partial-leave-open | 2 |
+| not-addressed-simple | 1 |
+| not-addressed-difficult | 1 |
+| not-relevant-close | 1 |
+| bug-still-reproduces | 1 |
+| fixed-on-transformerbridge | 8 |
+| bug-likely-fixed-needs-verification | 3 |
+| question-not-actionable | 1 |
+| **Total batch 4** | **20** |
+
+### Batch 5 only (1 issue)
+
+| Bucket | Count |
+|---|---|
+| question-not-actionable | 1 |
+| **Total batch 5** | **1** |
+
+### Cumulative (batches 1 + 2 + 3 + 4 + 5 = 81 issues)
+
+| Bucket | Count | Recommended action |
+|---|---|---|
+| covered-close | 3 | Close (3) |
+| partial-leave-open | 12 | Leave open with scope note (12) |
+| not-addressed-simple | 13 | Leave open / `good-first-issue` (13) |
+| not-addressed-difficult | 9 | Leave open (9) |
+| not-relevant-close | 2 | Close (2) |
+| bug-still-reproduces | 6 | Leave open (6) |
+| fixed-on-transformerbridge | 23 | Comment with bridge migration recipe (23) |
+| bug-likely-fixed-needs-verification | 7 | Ask for repro (7) |
+| question-not-actionable | 6 | Close with docs pointer (6) |
+| **Cumulative total** | **81 of 83** | **Close: 11 / Leave open: 70** |
+
+_(#1263 and #1264 are tracking issues opened by the maintainer; not triaged.)_
+
+### Per-issue summary table
+
+Status legend: ✅ resolved · ⚠️ partial · ❌ not addressed · 🐛 bug reproduces · N/A — feature/concept doesn't apply on that side.
+
+#### Batch 1 (20 issues)
+
+| # | Issue | Bucket | HookedTransformer | TransformerBridge | Replication |
+|---|---|---|---|---|---|
+| #97 | [Better docs for model_properties_table](#issue-97) | partial-leave-open | ⚠️ auto-table covers arch cols; training metadata missing | ⚠️ same auto-table | code-verified |
+| #99 | [Tests + docs for ActivationCache](#issue-99) | not-addressed-simple | ⚠️ tests exist; docstring order bug remains | ⚠️ same (shared class) | code-verified |
+| #100 | [Tests + docs for tokenization](#issue-100) | partial-leave-open | ⚠️ extensive tests; prepend_bos clarity gap | ⚠️ same | code-verified |
+| #104 | [Mixed precision (fp16/bf16)](#issue-104) | fixed-on-transformerbridge | ⚠️ per-arch precision quirks remain | ✅ HF-native; no TL-specific NaN paths | empirical |
+| #107 | [HF evals helper](#issue-107) | not-addressed-difficult | ❌ no lm-eval-harness adapter | ⚠️ users can extract `original_model` and pass to lm-eval-harness | code-verified |
+| #111 | [Direct path patching demo](#issue-111) | not-addressed-difficult | ❌ no first-class helper; ARENA notebook only | ❌ same | code-verified |
+| #112 | [Logit display helper](#issue-112) | not-addressed-simple | ⚠️ `test_prompt` print only; no DataFrame, no logit-lens heatmap | ⚠️ same | code-verified |
+| #207 | [Hook AssertionError in Attribution Patching demo](#issue-207) | covered-close | ⚠️ broad-pattern add_hook still asserts (separate UX issue) | ✅ demo rewritten (PR #1013) + smarter filter | empirical |
+| #210 | [`get_full_resid_decomposition` tensor arg](#issue-210) | not-addressed-simple | ❌ kwarg not added | ❌ same (shared ActivationCache) | code-verified |
+| #264 | [GatedMLP not in docs](#issue-264) | partial-leave-open | ❌ smoke tests only; no class docstring; no config field docstring | ⚠️ better class docs; indirect adapter tests; no parity test; no config field docstring | code-verified |
+| #277 | [BERT future work](#issue-277) | partial-leave-open | ⚠️ MaskedLM + 4 model sizes shipped; NSP/training/LN-fold missing | ⚠️ same architecture coverage | code-verified |
+| #290 | [GPU memory leak](#issue-290) | partial-leave-open | ⚠️ major fixes landed (PR #1229); residual retention plausible | ✅ delegates to HF; no TL-specific circular refs | unverifiable |
+| #297 | [Print attached hooks](#issue-297) | not-addressed-simple | ❌ no `list_hooks()` helper; `hook_dict` raw-accessible | ❌ same hook_points machinery | code-verified |
+| #335 | [LN1 hooks fire 3× per forward](#issue-335) | fixed-on-transformerbridge | 🐛 reproduces (`transformer_block.py:172,174,176`) | ✅ HF attention; fires once | empirical |
+| #341 | [`FactoredMatrix.svd` deprecated `torch.svd`](#issue-341) | not-addressed-simple | ❌ uses `torch.svd`, returns V not Vh | ❌ same `FactoredMatrix` | code-verified |
+| #378 | [Flash attention support](#issue-378) | fixed-on-transformerbridge | ❌ no SDPA flag; hand-rolled einsum | ✅ HF `attn_implementation="sdpa"`/`"flash_attention_2"` | code-verified |
+| #385 | [Pythia rotary mismatch vs HF](#issue-385) | fixed-on-transformerbridge | 🐛 NaN logits in fp32 baseline (possible regression) | ✅ uses HF rotary directly | empirical |
+| #448 | [`n_params` way off](#issue-448) | bug-still-reproduces | 🐛 gpt2-small reports 84M; actual 163M | 🐛 shares calculation path | empirical |
+| #453 | [`checkpoint_label` returns same weights](#issue-453) | bug-likely-fixed-needs-verification | N/A — `checkpoint_label` is not a parameter (user error; kwarg silently swallowed) | N/A — no checkpoint feature | code-verified |
+| #462 | [Mamba support](#issue-462) | fixed-on-transformerbridge | ❌ not supported (by-design) | ✅ `MambaArchitectureAdapter` + `Mamba2ArchitectureAdapter` registered | code-verified |
+
+#### Batch 2 (20 issues)
+
+| # | Issue | Bucket | HookedTransformer | TransformerBridge | Replication |
+|---|---|---|---|---|---|
+| #479 | [Memory-efficient causal mask](#issue-479) | partial-leave-open | 🐛 `(n_ctx, n_ctx)` buffer per layer; ~86 GB at Qwen 72B scale | ⚠️ architecture-dependent: GPT2 inherits same buffer; Llama/Pythia/Qwen/etc. dynamic | empirical |
+| #481 | [Tracr demo broken](#issue-481) | bug-still-reproduces | 🐛 `np.eye(d_model, d_vocab_out)` unembed assumption still in demo | 🐛 same — Tracr-specific, demo not bridge-ported | code-verified |
+| #483 | [`generate()` no-tokenizer fail](#issue-483) | bug-still-reproduces | 🐛 reproduces — assumes `self.tokenizer` exists | 🐛 same — bridge generate also reads `self.tokenizer` | empirical |
+| #502 | [VLM support question](#issue-502) | question-not-actionable | ❌ no native VLM | ⚠️ LLaVA family supported; BLIP-VQA not | code-verified |
+| #509 | [BERT LN folding](#issue-509) | not-addressed-difficult | ❌ post-LN architecture; Neel: not foldable cleanly | ❌ same architectural limit | code-verified |
+| #515 | [`IOIDataset` duplicate entries](#issue-515) | bug-still-reproduces | 🐛 `random.seed(42)` at top of `get_sample` (evals.py:387) | 🐛 same module | empirical |
+| #523 | [Residual stack not adding up](#issue-523) | question-not-actionable | N/A — user error (forgot LN gain/bias); resolved in thread | N/A — same | code-verified |
+| #543 | [Grokking demo broken](#issue-543) | bug-likely-fixed-needs-verification | ⚠️ multiple post-issue commits; needs fresh Colab repro | ⚠️ same | unverifiable |
+| #569 | [Llama-3-70B 4bit multi-GPU](#issue-569) | fixed-on-transformerbridge | 🐛 BnB-packed weights fail QKV reshape | ✅ skips state_dict reshape; structurally sound, end-to-end unverified | unverifiable |
+| #588 | [Tests for model configs](#issue-588) | partial-leave-open | ⚠️ tests exist for ~3 configs; not all 185 | ⚠️ structural-mapping tests for ~15 architectures | code-verified |
+| #595 | [Stopping Criteria support](#issue-595) | not-addressed-simple | ❌ only `stop_at_eos`; no callable | ❌ same | code-verified |
+| #615 | [HT ≠ HF for Llama 3](#issue-615) | fixed-on-transformerbridge | ⚠️ post-einsum-cleanup max diff ~2e-4 (degenfabian); residual reports persist | ⚠️ bridge has its own attention reconstruction; ~2.5e-3 max drift vs HF (argmax matches) | empirical |
+| #644 | [Map act names to transformer](#issue-644) | not-addressed-simple | ❌ no diagram in `model_structure.md` | ❌ same docs source | code-verified |
+| #661 | [Pythia split_qkv batch consistency](#issue-661) | bug-still-reproduces | 🐛 max diff `1.14e-02` between batch[:1] and batch[:2] | N/A — bridge has no `use_split_qkv_input` flag | empirical |
+| #684 | [Quantization beyond Llama](#issue-684) | fixed-on-transformerbridge | 🐛 hard-coded "Llama only" assertion | ✅ no architecture assertion; structurally sound for non-Llama quantized, end-to-end unverified | code-verified |
+| #696 | [Cached LN scale factors meaning](#issue-696) | question-not-actionable | N/A — conceptual Q; Neel answered | N/A — same | code-verified |
+| #697 | [Activation cache during generate](#issue-697) | not-addressed-simple | ❌ no `run_with_cache` integration in `generate()` | ❌ same | code-verified |
+| #704 | [TracrBench support](#issue-704) | not-relevant-close | ❌ not in core | ❌ not in core | code-verified |
+| #710 | [MVP per-modality support](#issue-710) | not-addressed-difficult | ❌ no non-text models | ⚠️ Hubert (audio), LLaVA family (VLM); no Whisper/ResNet/diffusion | code-verified |
+| #720 | [Matmul function audit](#issue-720) | partial-leave-open | ⚠️ partial — `F.linear` cleanup landed for some paths; full audit pending | ⚠️ Q/K/V projections use HF Linear (correct); attention-score `torch.matmul` and output `torch.matmul` are bridge code (own audit needed) | code-verified |
+
+#### Batch 3 (20 issues)
+
+| # | Issue | Bucket | HookedTransformer | TransformerBridge | Replication |
+|---|---|---|---|---|---|
+| #729 | [Guide to adding new models](#issue-729) | not-addressed-simple | ❌ no how-to-extend doc | ❌ same; bridge has cleaner extension primitive but no walkthrough | code-verified |
+| #737 | [Q reshape in 4bit](#issue-737) | partial-leave-open | 🐛 4bit + `use_split_qkv_input` shape mismatch | N/A — bridge has no `use_split_qkv_input` flag | unverifiable |
+| #754 | [Don't load HF when config passed](#issue-754) | fixed-on-transformerbridge | 🐛 `convert_hf_model_config` calls `AutoConfig.from_pretrained` unconditionally | ✅ `boot_transformers(hf_model=...)` skips AutoConfig | code-verified |
+| #773 | [BioGPT-style LN placement](#issue-773) | not-addressed-difficult | ❌ hard-coded GPT-2 LN placement | ⚠️ adapter framework supports custom LN layout; no BioGPT adapter exists | code-verified |
+| #778 | [Gemma2 attn order wrong](#issue-778) | fixed-on-transformerbridge | 🐛 `[global, local, ...]` (inverted from HF's `[local, global, ...]`) | ✅ uses HF `layer_types` directly | empirical |
+| #784 | [Smaller precision OOM](#issue-784) | fixed-on-transformerbridge | ⚠️ source state_dict + working copy duplicates memory at load | ✅ single allocation; bf16 fits on 6GB | unverifiable |
+| #796 | [`FactoredMatrix.svd` lru_cache GC](#issue-796) | not-addressed-simple | 🐛 `@lru_cache` holds instance refs | 🐛 same `FactoredMatrix` | code-verified |
+| #798 | [Remove `model_args`](#issue-798) | not-addressed-simple | ⚠️ `*model_args` + `**model_kwargs` in encoder/hook_points | ⚠️ same hook_points machinery | code-verified |
+| #800 | [Offline GPT2-xl load fails](#issue-800) | fixed-on-transformerbridge | 🐛 same root cause as #754 | ✅ `boot_transformers(hf_model=...)` works offline | code-verified |
+| #801 | [Padding side mismatch (Gemma 2)](#issue-801) | bug-likely-fixed-needs-verification | ⚠️ original repro was on TL 2.9.0; current dev shows `'left'` for both | ✅ inherits HF tokenizer | empirical |
+| #830 | [Type hint for `ActivationCache.model`](#issue-830) | not-addressed-simple | ❌ untyped to avoid circular import | ❌ same shared class | code-verified |
+| #837 | [Multi-GPU device ordinal off-by-one](#issue-837) | fixed-on-transformerbridge | 🐛 same family as #907/#911/#968 | ✅ pre-loaded `hf_model` w/ accelerate works on dev; first-class via PR #1270 | unverifiable |
+| #846 | [Local `hf_model.config` priority for Qwen](#issue-846) | fixed-on-transformerbridge | 🐛 same root cause as #754 | ✅ `boot_transformers(hf_model=...)` works | code-verified |
+| #858 | [gemma-7b-it OOM on 2× H100](#issue-858) | fixed-on-transformerbridge | ⚠️ duplicate-allocation pattern + multi-GPU placement bugs | ✅ pre-loaded `hf_model` with `device_map="auto"` should fit | unverifiable |
+| #867 | [Qwen2-VL support](#issue-867) | not-addressed-difficult | ❌ no VLM | ❌ no `Qwen2VLForConditionalGeneration` adapter — different from LLaVA family | code-verified |
+| #869 | [Custom video transformer](#issue-869) | not-addressed-difficult | ❌ not designed for diffusion/video | ❌ same | code-verified |
+| #872 | [Official `device_map` support](#issue-872) | partial-leave-open | 🐛 `n_devices` has placement bugs (#837 family) | ⚠️ pre-loaded `hf_model=` works on dev; PR #1270 makes it first-class but not yet merged | unverifiable |
+| #873 | [Llama2-7b-chat-hf load fail](#issue-873) | bug-likely-fixed-needs-verification | ⚠️ ambiguous error in body (only screenshots); many Llama issues fixed since | ⚠️ same — needs reporter retest | unverifiable |
+| #878 | [Layer-wise caching for OOM](#issue-878) | question-not-actionable | N/A — usage Q for attribution patching memory | N/A — same | unverifiable |
+| #888 | [Adapt to non-supported model (CLIP language)](#issue-888) | not-addressed-difficult | ❌ no extension mechanism | ⚠️ adapter framework supports it; no `CLIPTextModelArchitectureAdapter` exists | code-verified |
+
+#### Batch 4 (20 issues)
+
+| # | Issue | Bucket | HookedTransformer | TransformerBridge | Replication |
+|---|---|---|---|---|---|
+| #894 | [Implement LongRoPE](#issue-894) | fixed-on-transformerbridge | ❌ only `llama3`/`yarn` rope_type branches; no `longrope` | ✅ delegates rope to HF; LongRoPE works natively for Phi-3.5/Phi-4-mini | code-verified |
+| #902 | [NaN weights when initializing](#issue-902) | bug-likely-fixed-needs-verification | ⚠️ original repro on TL 2.15.0; current dev not yet retested | ✅ uses HF native init, not TL's `_init_weights_*` paths | unverifiable |
+| #903 | [gpt2-small `n_params` reports 85M](#issue-903) | bug-still-reproduces | 🐛 same calc as #448; embeddings/unembed excluded | 🐛 shares `HookedTransformerConfig` calc | code-verified |
+| #904 | [Gemma fold_value_biases device mix](#issue-904) | fixed-on-transformerbridge | 🐛 `b_O + (b_V * W_O).sum(...)` w/o `.to(device)` | ✅ no fold_value_biases by default; HF device_map respected | unverifiable |
+| #907 | [PR #864 device-selection breaks multi-GPU](#issue-907) | fixed-on-transformerbridge | 🐛 greedy memory placement scatters sequential blocks | ✅ HF accelerate via `hf_model=`; PR #1270 makes it first-class | unverifiable |
+| #909 | [Documentation for hookpoints](#issue-909) | covered-close | ✅ `model_structure.md` documents legacy aliases too | ✅ same doc covers canonical names | code-verified |
+| #911 | [PosEmbed device error with accelerate](#issue-911) | fixed-on-transformerbridge | 🐛 `W_pos[offset_position_ids]` cross-device under DDP | ✅ uses HF's `wpe` directly via `EmbeddingBridge` | unverifiable |
+| #912 | [Support mT5](#issue-912) | partial-leave-open | ❌ T5-only path | ⚠️ `MT5ForConditionalGeneration` in SUPPORTED_ARCHITECTURES; `model_type="mt5"` not in `model_type_mappings` | code-verified |
+| #923 | [Pythia missing `hook_resid_mid`](#issue-923) | not-relevant-close | N/A — by-design (parallel attn+MLP) | N/A — same | code-verified |
+| #929 | [Load custom small GPT-2 with hf_model](#issue-929) | fixed-on-transformerbridge | 🐛 same root cause as #754 (AutoConfig refetch) | ✅ `boot_transformers(hf_model=...)` reads user's config | code-verified |
+| #930 | [Quantized Llama 3.2 fails to load](#issue-930) | fixed-on-transformerbridge | 🐛 BnB-packed shape mismatch (same as #569) | ✅ skips state_dict reshape; structurally sound, end-to-end unverified | unverifiable |
+| #950 | [Support SimpleStories](#issue-950) | partial-leave-open | ❌ not registered | ⚠️ fine-tunes registered (1.25M, 35M); base SimpleStories not yet | code-verified |
+| #953 | [Gemma 3n (E2B & E4B)](#issue-953) | not-addressed-difficult | ❌ not supported | ❌ not registered; AltUp/LAuReL/PLE need dedicated component bridges | code-verified |
+| #962 | [Multiple GPU support question](#issue-962) | question-not-actionable | ⚠️ `n_devices=N` works (with multi-GPU bug cluster caveats) | ⚠️ `hf_model=` w/ `device_map="auto"` works; PR #1270 first-class | code-verified |
+| #968 | [unsloth/llama-3.2-3b 2× 3060 indices error](#issue-968) | bug-likely-fixed-needs-verification | 🐛 multi-GPU bug cluster (#837/#907/#911) | ⚠️ jlarson4 commented offering bridge + #1270 path | unverifiable |
+| #993 | [Load compressed Llama/Qwen](#issue-993) | fixed-on-transformerbridge | 🐛 hard-coded "Llama only" + reshape assumes unpacked weights | ✅ no architecture assertion; HF parameter passthrough | code-verified |
+| #1039 | [Loading models from local files](#issue-1039) | fixed-on-transformerbridge | 🐛 same root cause as #754/#800 (AutoConfig refetch) | ✅ `boot_transformers(hf_model=...)` works offline | code-verified |
+| #1080 | [Colab import fails (numpy ABI)](#issue-1080) | bug-likely-fixed-needs-verification | ⚠️ TL itself doesn't pin numpy<2; transitive ABI mismatch | ⚠️ same install path | unverifiable |
+| #1133 | [`tokenize_and_concatenate` cuts tokens](#issue-1133) | covered-close | ✅ PR #1273 (`ad8e123b`) replaces char-chunking with per-doc tokenization | ✅ shared utility | code-verified |
+| #1148 | [VSM Telemetry tutorial](#issue-1148) | not-addressed-simple | ❌ no tutorial | ❌ same; bridge hooks would work equivalently | code-verified |
+
+#### Batch 5 (1 issue)
+
+| # | Issue | Bucket | HookedTransformer | TransformerBridge | Replication |
+|---|---|---|---|---|---|
+| #1165 | [Yoruba tokenization fragmentation](#issue-1165) | question-not-actionable | ❌ no span-pooling helper | ❌ same | code-verified |
+
+## Investigation entries
+
+Entries are grouped chronologically by issue number, batch by batch.
+
+---
+
+### Batch 1: oldest 20 still-open issues
+
+_(complete — sign-off requested before batch 2)_
+
+<a id="issue-97"></a>
+
+#### #97 — Better docs for model properties
+
+- **Issue**: Improve `model_properties_table` — better column key, parallel-attn flag, positional-embed type, training metadata (dataset, dropout, weight decay).
+- **HookedTransformer**: `docs/source/generated/model_properties_table.{csv,jsonl,md,html}` exists with comprehensive auto-generated content covering `parallel_attn_mlp`, `positional_embedding_type`, `original_architecture`, `normalization_type`, `gated_mlp`, `n_params`, etc. Auto-regen is wired up via `docs/make_docs.py`.
+- **TransformerBridge**: same auto-table covers bridge architectures via `transformer_lens/tools/model_registry/`.
+- **Replication**: `[code-verified]` — table exists and appears regenerated periodically; covers most architectural columns Neel listed.
+- **What's missing**: qualitative training metadata (dataset name, training tokens, dropout, weight decay) is not in the table. That info isn't on HF configs so it would need a hand-curated supplementary table.
+- **Bucket**: `partial-leave-open`
+- **Next step**: small docs task — add a `training_metadata.csv` (manually curated, ~50 rows) that the table-build joins against. Or close as practically-resolved by the auto-table and open a new issue for the training-metadata supplement.
+
+<a id="issue-99"></a>
+
+#### #99 — Add tests + better docs for ActivationCache
+
+- **Issue**: Add tests for `ActivationCache` methods. Comment-thread bug from `andylolu2`: `get_full_resid_decomposition` docstring says component order is `[embed, pos_embed, heads, neurons, biases]` but actual stacking order is `[*heads, *neurons, embed, pos_embed, biases]` — leading to silent unpacking errors.
+- **HookedTransformer**: `tests/acceptance/test_activation_cache.py` exists. Docstring at [ActivationCache.py:1105-1107](transformer_lens/ActivationCache.py#L1105-L1107) still says "embed, pos_embed, each head result, each neuron result, and the accumulated biases" — the misleading order persists.
+- **TransformerBridge**: `tests/acceptance/model_bridge/compatibility/test_activation_cache.py` exists for bridge parity.
+- **Replication**: `[code-verified]` — docstring text confirmed wrong vs. actual stacking. Empirical verification of stacking order would take ~10 lines but the docstring/code mismatch is clear from reading.
+- **Bucket**: `not-addressed-simple`
+- **Next step**: 3-line docstring fix at `ActivationCache.py:1105-1107` to match actual `[heads, neurons, embed, pos_embed, biases]` order. Tests already cover the main flows.
+
+<a id="issue-100"></a>
+
+#### #100 — Add tests + better docs for tokenization methods
+
+- **Issue**: Add tests for `to_tokens`, `to_string`, `to_str_tokens`, `get_token_position`. Clarify `prepend_bos` documentation.
+- **HookedTransformer**: Multiple test files cover this — `tests/integration/test_tokenization_methods.py`, `test_only_tokenizer.py`, `test_utils_tokens.py`, `tests/acceptance/test_hook_tokens.py`, `test_tokenizer_special_tokens.py`. Tokenize utils were extracted to `transformer_lens/utilities/tokenize_utils.py`.
+- **TransformerBridge**: bridge has its own `to_tokens` etc. — uses HF tokenizer directly.
+- **Replication**: `[code-verified]` — substantial test coverage already exists.
+- **Bucket**: `partial-leave-open`
+- **Next step**: docs side — `prepend_bos` is well-documented in current `HookedTransformerConfig` (lines 144-152) but a standalone explainer page in Sphinx docs would close the original ask. ~50-line docs task.
+
+<a id="issue-104"></a>
+
+#### #104 — Add mixed precision (fp16/bf16) inference incl. loading
+
+- **Issue**: Load models in fp16/bf16, esp. for large models. Comment thread surfaced two specific patches from `slavachalnev`: keep LayerNorm in fp32; apply attention scale before computing scores (avoid -inf in fp16).
+- **HookedTransformer**: dtype handling exists (dtype param on `from_pretrained`, `load_in_4bit`). Thread-described NaN issues partially fixed via PRs #366, #389. Per-architecture fp16/bf16 numerical divergence vs. HF persists (Pythia, Llama-2/3 — see #385 for one such regression I confirmed).
+- **TransformerBridge**: bridge wraps HF's modules directly, including HF's LayerNorm/attention/rotary. The TL-side numerical paths that caused fp16 NaN (manual LN cast, attention scale order) don't exist on the bridge — by construction matches HF in any precision HF supports. Original "load in fp16/bf16" ask shipped via `boot_transformers(..., dtype=torch.float16)`.
+- **Replication**: `[empirically replicated]` — see #385: pythia-70m via `from_pretrained_no_processing` produces NaN logits even in fp32 baseline. Bridge avoids by using HF rotary/attention directly.
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: bridge users get the feature + numerical correctness today. HT users with fp16 precision concerns should be pointed at the bridge migration recipe. The HT-side per-architecture audit (overlapping with #385) becomes a lower-priority "fix the legacy path" task rather than a blocker for the original feature ask.
+
+<a id="issue-107"></a>
+
+#### #107 — Helper to run HuggingFace evals on HookedTransformer
+
+- **Issue**: Run HF evals (PIQA, TriviaQA, LAMBADA) against `HookedTransformer`. Suggested pivoting to `lm-evaluation-harness`.
+- **HookedTransformer**: `transformer_lens/evals.py` exists but contains the OLD pre-HF eval set (e.g. `IOIDataset`). No `lm-eval-harness` adapter. No way to feed a `HookedTransformer` instance into `lm-eval-harness`'s LM interface.
+- **TransformerBridge**: bridge wraps HF model, so users can pull `bridge.original_model` and feed that to `lm-eval-harness` directly. Indirect coverage only.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: meaningful integration project — write an `LM`-conforming wrapper for `HookedTransformer` per `lm-eval-harness`'s API, expose as `pip install transformer-lens[evals]`. ~200 LoC + test infra. Or document the bridge passthrough as an interim recipe.
+
+<a id="issue-111"></a>
+
+#### #111 — Demo of direct path patching
+
+- **Issue**: Add a section to Exploratory Analysis Demo demonstrating direct path patching for all head pairs. PR #49 was an early attempt.
+- **HookedTransformer**: `demos/Activation_Patching_in_TL_Demo.ipynb`, `demos/Attribution_Patching_Demo.ipynb` exist. Neither covers direct path patching specifically. No first-class TL helper for direct path patching.
+- **TransformerBridge**: same — no path-patching primitive in either API.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: callum mcdougall pointed users at the [ARENA IOI notebook](https://colab.research.google.com/drive/1KgrEwvCKdX-8DQ1uSiIuxwIiwzJuQ3Gw) which covers path patching. Could either close with a docs pointer to ARENA, or implement a TL helper that wraps the pattern (~80 LoC).
+
+<a id="issue-112"></a>
+
+#### #112 — Helper to display vectors of logits nicely
+
+- **Issue**: Neel asked for two things: **MVP** — function mapping logit vector → pandas DataFrame `(token_index, token_string, logit, log_prob, probability)`, top-K or full vocab. **Bonus** — nostalgebraist-style `plot_logit_lens` heatmap (layer × position grid, top token per cell, colored by value). Comment thread later discussed an _additional_ interactive circular visualization (sheikheddy's proposal) that Neel + sheikheddy redirected to CircuitsVis — but that redirect is about the JS-heavy interactive vis, NOT about the original MVP/bonus asks.
+- **HookedTransformer**: `test_prompt` in [`transformer_lens/utilities/exploratory_utils.py`](transformer_lens/utilities/exploratory_utils.py) prints top-k tokens with logit/prob/rank for a prompt+answer — partly satisfies the spirit of the MVP but is print-only, not structured (no DataFrame return), and single-position only. No `plot_logit_lens` heatmap helper.
+- **TransformerBridge**: same — `test_prompt` works through the bridge too, no separate visualization helper.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~30 LoC for the DataFrame helper (`logits_to_df(logits, tokenizer, top_k=None) -> pd.DataFrame`), ~50 LoC for the heatmap (`plot_logit_lens` matplotlib version that takes `(n_layers, pos, d_vocab)` and renders the grid). Both are small, well-scoped library additions independent of CircuitsVis.
+
+<a id="issue-207"></a>
+
+#### #207 — Can't add hook to pretrained model: AssertionError on `hook_q_input`
+
+- **Issue**: Attribution Patching Demo's `model.add_hook(lambda name: True, ...)` raised `AssertionError: Cannot add hook blocks.0.hook_q_input if use_split_qkv_input is False`. The reporter's specific complaint was that the published demo crashed.
+- **HookedTransformer**: the cfg-gated assertion still exists at [HookedTransformer.py:264-266](transformer_lens/HookedTransformer.py#L264-L266) — the broader UX question Neel raised in the thread ("warning vs. assertion") was not actioned.
+- **TransformerBridge**: the demo itself was rewritten in PR #1013 (commit `b4fc3754` "updated loading in attribution patching demo to use transformer bridge") to use `TransformerBridge.boot_transformers("gpt2", ...)` + `model.set_use_attn_result(True)` + a smarter filter (`lambda name: "_input" not in name` — the atlaie workaround from the thread, formalized). The reported crash scenario can no longer occur in the canonical demo.
+- **Replication**: `[empirically replicated]` — the underlying broad-pattern assertion still fires on `lambda name: True` (I verified during batch 1 investigation). But the _demo workflow_ runs cleanly.
+- **Bucket**: `covered-close`
+- **Next step**: close, pointing reporter at PR #1013 / the current `demos/Attribution_Patching_Demo.ipynb`. The latent UX concern (broad-pattern add_hook should warn-not-assert) is a separate, unfiled issue — file a new ticket if someone wants to revisit Neel's "warnings are annoying but silent bugs are worse" question.
+
+<a id="issue-210"></a>
+
+#### #210 — `get_full_resid_decomposition` accept tensor argument
+
+- **Issue**: Add a `project_output_onto: [d_model]` or `[d_model, num_outputs]` argument so neuron-decomposition doesn't blow GPU memory by materializing `[batch, pos, d_mlp, d_model]`.
+- **HookedTransformer**: [`ActivationCache.py:1091`](transformer_lens/ActivationCache.py#L1091) signature has no `project_output_onto`. Memory-blowing path still active.
+- **TransformerBridge**: same — bridge uses the same `ActivationCache` class.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: add `project_output_onto` kwarg + `(neurons * (W_out @ project_output_onto))` path. ~15 LoC + 1 test. Alan Cooney commented they'd take it; never landed.
+
+<a id="issue-264"></a>
+
+#### #264 — GatedMLP not in docs
+
+- **Issue**: Three sub-tasks: (1) docstring for `gated_mlp` config arg; (2) tests including parity vs equivalent PyTorch impl + activation-cache verification; (3) optional tutorial.
+- **HookedTransformer**: `gated_mlp` config field at [HookedTransformerConfig.py:249](transformer_lens/config/HookedTransformerConfig.py#L249) has no docstring entry. `tests/unit/components/mlps/test_gated_mlp.py` is smoke tests only (init + output shape, 41 lines) — no parity test against an equivalent `nn.Module`, no cache-correctness test.
+- **TransformerBridge**: `transformer_lens/model_bridge/generalized_components/gated_mlp.py` is **substantially better documented** — class docstring spells out the gated MLP structure formula, hook semantics for compat-vs-raw modes, and method-level docstrings on `__init__`, `forward`, `set_processed_weights`. But `cfg.gated_mlp` in `TransformerBridgeConfig` is also undocumented. **Test coverage** is indirect: 4 adapter unit tests (Qwen3.5, InternLM2, Cohere, Baichuan) verify GatedMLPBridge is wired correctly into each adapter — structural mapping tests, not numerical parity. Existing `test_bridge_vs_hooked_comparison.py` uses distilgpt2 (no gated MLP); `test_weight_processing_perfect_match.py` uses gpt2 (no gated MLP). The numerical parity test `0amp` suggested doesn't exist on either side.
+- **Replication**: `[code-verified]`
+- **Bucket**: `partial-leave-open`
+- **Next step**:
+  - For sub-task 1 (config docstring): ~3-line addition in both `HookedTransformerConfig` and `TransformerBridgeConfig` for `gated_mlp` field. Trivial.
+  - For sub-task 2 (parity test): ~30 LoC test that loads a small gated-MLP model (e.g., `unsloth/llama-3.2-1b` or a tiny custom config) and checks `torch.allclose(bridge_output, hf_output)` end-to-end. This would close `0amp`'s ask AND provide regression coverage for the entire gated-MLP forward path on the bridge.
+  - Optional sub-task 3 (tutorial): defer to a separate scoped issue.
+
+<a id="issue-277"></a>
+
+#### #277 — BERT: Future work (multi-checklist tracking)
+
+- **Issue**: Tracking issue for BERT enhancements: expand demo, more BERT models, NSP support, weight processing incl. LN folding, training/finetuning support, convenience-property tests.
+- **HookedTransformer**: `BertForMaskedLM` adapter present. `BertForNextSentencePrediction` is NOT registered in `SUPPORTED_ARCHITECTURES` — only mentioned as an example for `model_class=` override.
+- **Models**: `bert-base-cased`, `bert-base-uncased`, `bert-large-cased`, `bert-large-uncased` all registered in `supported_models.py`.
+- **Demo**: `demos/BERT.ipynb` exists.
+- **LN folding for BERT**: still hard (post-norm) — covered by separate open issue #509.
+- **Training/dropout support**: not addressed.
+- **TransformerBridge**: same architecture coverage as HT.
+- **Replication**: `[code-verified]`
+- **Bucket**: `partial-leave-open`
+- **Next step**: split into separate scoped issues — close #277 as a stale tracking issue and open dedicated issues for (a) NSP adapter, (b) BERT LN folding (#509), (c) training/dropout support. The meta-checklist format obscures what's actually outstanding.
+
+<a id="issue-290"></a>
+
+#### #290 — GPU memory leak when HookedTransformer goes out of scope
+
+- **Issue**: `del model; gc.collect(); torch.cuda.empty_cache()` doesn't reclaim memory after loading multiple models in a loop.
+- **HookedTransformer**: substantial debugging in the thread by `rusheb`/`pranavgade20`. Identified two parts: (1) circular reference in `mod_dict` (the empty-name self-reference), (2) tensors in `state_dict[k] = v.to(device)` not detaching from compute graph. PR #1229 ("detach in load") landed at least one of the fixes.
+- **TransformerBridge**: bridge wraps HF model directly; HF's own memory hygiene applies. Bridge has its own concerns but the HT-specific circular-reference / non-detach issues don't apply.
+- **Replication**: `[unverifiable on this machine]` — needs GPU profiling tooling and ~10× model loads to see the leak. The line numbers referenced in the thread (HookedTransformer.py:860-870) no longer match — code has been refactored.
+- **Bucket**: `partial-leave-open`
+- **Next step**: re-run the `fil-profile` reproduction from the comment thread on current `dev` to confirm whether residual leak exists. If yes, the next-worst-offender per `rusheb` was `move_model_modules_to_device` — that overlaps with the multi-GPU bug cluster (#837/#907/#911/#968).
+
+<a id="issue-297"></a>
+
+#### #297 — Better print-outs for currently attached hooks
+
+- **Issue**: API for listing hooks attached to a model + HookPoint, with detail.
+- **HookedTransformer**: no first-class `model.list_hooks()` or `HookPoint.describe()` API. `model.hook_dict` is publicly accessible (`Dict[str, HookPoint]`); `hp.fwd_hooks` and `hp.bwd_hooks` are inspectable lists. Users can roll their own iteration but it's inconvenient.
+- **TransformerBridge**: same — uses the same `hook_points` machinery.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: add `HookedRootModule.list_active_hooks()` returning `Dict[str, List[hook_repr]]`. ~15 LoC + 1 test. PR #302 mentioned in the thread for sub-task (ii) was abandoned.
+
+<a id="issue-335"></a>
+
+#### #335 — Improve LN1's hooks (LN1 hook fires 3 times per forward)
+
+- **Issue**: When `use_split_qkv_input=False` (default), `transformer_block.py` still calls `self.ln1(query_input/key_input/value_input)` three times. Hooks on `ln1` get called 3× and the cached tensor gets overwritten 3×.
+- **HookedTransformer**: confirmed at [transformer_block.py:172,174,176](transformer_lens/components/transformer_block.py#L172-L176) — three `self.ln1(...)` calls.
+- **TransformerBridge**: bridge uses HF's attention with HF's LayerNorm — fired once per forward.
+- **Replication**: `[empirically replicated]` — added a counting hook on `blocks[0].ln1.hook_normalized` in gpt2; called 3 times per forward.
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: Arthur tagged this `low-priority`. Fix on HT side would either pass an extra `[3, batch, pos, d_model]` dim through `ln1` OR cache the LN1 output once and reuse. ~20 LoC. Or recommend bridge migration since the bug doesn't apply there.
+
+<a id="issue-341"></a>
+
+#### #341 — Update FactoredMatrix.svd() (uses deprecated `torch.svd`, returns V not Vh)
+
+- **Issue**: TL uses deprecated `torch.svd` (which returns V, not Vh) inside `FactoredMatrix.svd`. Should switch to `torch.linalg.svd` and return Vh per modern convention.
+- **HookedTransformer/Bridge**: confirmed at [FactoredMatrix.py:230-233](transformer_lens/FactoredMatrix.py#L230-L233) — still uses `torch.svd(...)` and returns Vh as variable but it's actually V.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~15-line fix — switch to `torch.linalg.svd(..., full_matrices=False)`, return `Vh` directly, update docstring noting the breaking change. `diego898` in the thread offered to send PR. Breaking change so should land with a deprecation warning + version bump.
+
+<a id="issue-378"></a>
+
+#### #378 — Optionally use flash attention
+
+- **Issue**: Flash attention / SDPA flag for performance. Particularly useful for Pythia-12B and SAE training. Cost: lose intermediate attention pattern hooks.
+- **HookedTransformer**: no `attn_implementation` flag, no `scaled_dot_product_attention` path in `transformer_lens/components/`. Attention is hand-rolled with `einsum`.
+- **TransformerBridge**: `boot_transformers` sets `attn_implementation="eager"` by default ([sources/transformers.py:399](transformer_lens/model_bridge/sources/transformers.py#L399)) so users can override to `"sdpa"` or `"flash_attention_2"` via the adapter cfg. Bridge therefore supports flash attention naturally where HF supports it.
+- **Replication**: `[code-verified]`
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: HT-side fix is real work (~100 LoC + cache surface decisions). Bridge users can already use flash via HF's native support — recommend bridge migration.
+
+<a id="issue-385"></a>
+
+#### #385 — Pythia / Rotary Embeddings don't match HuggingFace
+
+- **Issue**: Logit drift between `HookedTransformer` and HF for Pythia models. Llama-2-7b-chat reportedly catastrophic. Llama-3.2 rotary mismatch persists per chengjiali.
+- **HookedTransformer**: rotary code lives in `transformer_lens/components/abstract_attention.py`. Multiple fixes have landed (PRs #366, #389, #454 referenced in thread; recent rotary PR `2c41b6c9 Weight processing/position embeddings attention`).
+- **TransformerBridge**: bridge uses HF's rotary implementation directly via `RotaryEmbeddingBridge` delegating to HF's `model.rotary_emb`. By construction matches HF.
+- **Replication**: `[empirically replicated]` — pythia-70m via `from_pretrained_no_processing` returns NaN logits when compared to HF's `GPTNeoXForCausalLM`. argmax doesn't match. This is **worse** than the issue's original report (~1e-3 to 1e-4 drift) — current state appears to have a regression.
+- **Bucket**: `bug-still-reproduces` (and possibly regressed) + `fixed-on-transformerbridge` for bridge users
+- **Next step**: investigate the NaN regression — first verify whether this reproduces with full `from_pretrained` (with default processing) and current `dev` HEAD. If real regression, bisect against `2c41b6c9` and other recent rotary PRs. Bridge users avoid this entirely.
+
+<a id="issue-448"></a>
+
+#### #448 — `n_params` counts are wrong
+
+- **Issue**: TL's `n_params` ignores embeddings and uses an oversimplified MLP formula (the `2x` factor is wrong for SwiGLU/gated MLPs).
+- **HookedTransformer**: calculation at [HookedTransformerConfig.py:325-334](transformer_lens/config/HookedTransformerConfig.py#L325-L334) — only counts attention + MLP, ignores embed/unembed/LN/biases. The `gated_mlp` factor was added (line 329 uses `2 + self.gated_mlp`) — partial improvement.
+- **TransformerBridge**: same `n_params` calculation path (shared config).
+- **Replication**: `[empirically replicated]` — gpt2-small reports `n_params = 84,934,656` but actual is `163,049,041`. Embeddings alone account for ~115M (W_E + W_pos + W_U); the formula misses everything except attn+MLP weights.
+- **Bucket**: `bug-still-reproduces`
+- **Next step**: replace the manual formula with `sum(p.numel() for p in self.parameters() if p.requires_grad)` (post-load) for total count. ~5 LoC + maintain backward compat by keeping the old "trainable parameters in transformer blocks" interpretation under a different attr name. Discussion in thread: Neel preferred total params for alignment with Pythia suite naming.
+
+<a id="issue-453"></a>
+
+#### #453 — `from_pretrained()` always downloads same weights with `checkpoint_label`
+
+- **Issue**: Reporter passes `checkpoint_label=...` and gets identical weights regardless of label. `checkpoint_index` works.
+- **HookedTransformer**: [HookedTransformer.py:1158-1159](transformer_lens/HookedTransformer.py#L1158-L1159) signature has `checkpoint_index` and `checkpoint_value` — **NOT `checkpoint_label`**. `checkpoint_label` is not a valid parameter; it gets silently absorbed into `**from_pretrained_kwargs` and discarded.
+- **TransformerBridge**: doesn't have a checkpoint feature — uses HF's native loading only.
+- **Replication**: `[code-verified]` — confirmed via `inspect.signature(HookedTransformer.from_pretrained)`: `checkpoint_label` not in parameters; `checkpoint_value` and `checkpoint_index` are.
+- **Bucket**: `bug-likely-fixed-needs-verification` (or arguably `question-not-actionable`) — the original "bug" is user error with a non-existent kwarg name. The latent issue is that **kwargs silently swallows unknown args.
+- **Next step**: respond to reporter that the parameter is `checkpoint_value`, not `checkpoint_label`. Optionally improve UX by validating unknown kwargs in `from_pretrained` and raising — small change but defensive. ~10 LoC.
+
+<a id="issue-462"></a>
+
+#### #462 — Add support for Mamba
+
+- **Issue**: Add Mamba SSM architecture support.
+- **HookedTransformer**: not supported (Mamba is fundamentally different from attention transformers; Neel's design philosophy was to keep HT focused on attention models).
+- **TransformerBridge**: `MambaArchitectureAdapter` and `Mamba2ArchitectureAdapter` both registered in [SUPPORTED_ARCHITECTURES](transformer_lens/factories/architecture_adapter_factory.py). Both `MambaForCausalLM` and `Mamba2ForCausalLM` HF model classes mapped.
+- **Replication**: `[code-verified]`
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: close with comment pointing at `TransformerBridge.boot_transformers("state-spaces/mamba-130m-hf")` as the supported recipe. Bridge's Mamba support is shipped.
+
+---
+
+### Batch 2: next 20 issues (#479 – #720)
+
+_(complete — sign-off requested before batch 3)_
+
+<a id="issue-479"></a>
+
+#### #479 — Memory efficient causal mask implementation
+
+- **Issue**: Each `Attention` layer registers a `(n_ctx, n_ctx)` boolean `causal_mask` buffer. For models with large `n_ctx` (e.g. Qwen 72B at 32768 ctx × 80 layers), this is ~86 GB of overhead. Should compute the mask on the fly at the actual context length.
+- **HookedTransformer**: confirmed at [abstract_attention.py:120,123](transformer_lens/components/abstract_attention.py#L120) — `causal_mask = torch.tril(torch.ones((self.cfg.n_ctx, self.cfg.n_ctx)).bool())` and `register_buffer("mask", causal_mask)`. Bug as reported still present for ALL architectures via HT.
+- **TransformerBridge**: **architecture-dependent**. Bridge wraps HF's attention modules; HF's choice about static-vs-dynamic mask varies by architecture:
+  - **GPT2-family** (HF `GPT2Attention.__init__` does `register_buffer("bias", torch.tril(...))` of shape `(1, 1, max_pos, max_pos)`): bridge inherits the same overhead. Empirically: gpt2 small bridge has 12 × 1MB = 12.5 MB of `(1024, 1024)` bool buffers — identical to HT.
+  - **GPTNeoX / Pythia / Llama / Qwen / Mistral / Gemma** (modern HF attention impls use `_update_causal_mask` per forward, no static buffer in `__init__`): bridge has zero overhead. Empirically: Pythia attn `__init__` declares only Q/K/V/output linears + scaling; no buffers.
+  - **The issue's motivating example (Qwen 72B at 32K ctx × 80 layers ≈ 86 GB)**: Qwen uses Llama-family attention → resolved on the bridge. The user's actual blocker is fixed.
+  - **GPT2 use case**: the bridge gives no relief, but the absolute overhead at gpt2's 1024 ctx is only 12 MB — not the same severity as Qwen 72B.
+- **Replication**: `[empirically replicated]` — gpt2 bridge total mask-buffer bytes: `12,582,912` (12 layers × 1MB each); gpt2 HT total: `12,582,912` (identical). HF GPT2's `register_buffer("bias", ...)` is the source.
+- **Bucket**: `partial-leave-open`
+- **Next step**: bridge users on modern architectures (Llama, Qwen, Pythia, etc. — covering most large-context use cases) already have the memory profile the issue asks for. For GPT2-family on the bridge, the overhead is small in absolute terms but the architectural problem exists. HT-side fix (~30 LoC: replace pre-allocated buffer with on-the-fly construction in `apply_causal_mask`) would close it for the legacy path. Worth noting that fixing it on HT alone solves it for GPT2 use cases regardless of bridge migration.
+
+<a id="issue-481"></a>
+
+#### #481 — Tracr to TransformerLens demo broken
+
+- **Issue**: Demo notebook assumes "the unembed is a projection onto the first few elements of the residual stream" — wrong because Tracr re-orders the residual stream alphabetically by RASP variable name. Demo silently fails on any RASP program where the output variable doesn't sort to the top. The fix needs Tracr to expose the unembed matrix in its model params.
+- **HookedTransformer**: demo at [`demos/Tracr_to_Transformer_Lens_Demo.ipynb`](demos/Tracr_to_Transformer_Lens_Demo.ipynb) — `grep` confirms `sd["unembed.W_U"] = np.eye(d_model, d_vocab_out)` line is still in the notebook. Demo NOT ported to TransformerBridge (no `boot_transformers` reference). Bug-as-described still reproduces.
+- **TransformerBridge**: same — Tracr-specific issue applies regardless of which TL API the demo uses; the bug is in the unembed-matrix derivation, not in TL's hook system.
+- **Replication**: `[code-verified]`
+- **Bucket**: `bug-still-reproduces`
+- **Next step**: needs Tracr upstream PR to expose `unembed_matrix` in `tracr.params`. Reporter (FlyingPumba) said they'd attempt the upstream change. Without that, the demo is fundamentally limited to RASP programs whose output variable sorts to the top of the residual stream.
+
+<a id="issue-483"></a>
+
+#### #483 — `HookedTransformer.generate()` `pad_token_id` error when tokenizer unset
+
+- **Issue**: `model.generate()` on a `HookedTransformer` with no tokenizer raises `AttributeError: 'NoneType' object has no attribute 'pad_token_id'`. Use case: training models on tokenizer-less domains (e.g., character-level integer addition).
+- **HookedTransformer**: confirmed at [HookedTransformer.py:772-773](transformer_lens/HookedTransformer.py#L772-L773) — `if self.tokenizer.pad_token is None: self.tokenizer.pad_token = self.tokenizer.eos_token` — assumes tokenizer exists. No `pad_token_id` parameter on `generate()`.
+- **TransformerBridge**: bridge's generate also relies on `self.tokenizer` for padding decisions (`bridge.py:2266`); same gap.
+- **Replication**: `[empirically replicated]` — minimal HookedTransformerConfig + no tokenizer + `generate(input, eos_token_id=0)` raises `AssertionError` (different surface error than originally reported, but the underlying gap is the same: generate path assumes a tokenizer).
+- **Bucket**: `bug-still-reproduces`
+- **Next step**: ~10 LoC fix — add optional `pad_token_id: Optional[int] = None` kwarg to `generate()`, threaded through to padding logic. Reporter offered to send a PR. Same fix should land on bridge `generate` for parity.
+
+<a id="issue-502"></a>
+
+#### #502 — How to use TransformerLens with HF visual language models?
+
+- **Issue**: User asks how to use TL with `Salesforce/blip-vqa-capfilt-large` and `xtuner/llava-internlm2-7b`.
+- **HookedTransformer**: no native VLM support. zazamrykh's fork added LLaVA support (referenced in thread).
+- **TransformerBridge**: LLaVA family natively supported — `LlavaArchitectureAdapter`, `LlavaNextArchitectureAdapter`, `LlavaOnevisionArchitectureAdapter` registered. Demo at [`demos/LLaVA.ipynb`](demos/LLaVA.ipynb). BLIP-VQA not yet supported (different VLM architecture).
+- **Replication**: `[code-verified]`
+- **Bucket**: `question-not-actionable`
+- **Next step**: close with response — LLaVA family is now first-class on TransformerBridge (point at the demo). BLIP-VQA would need a new adapter; user can file a separate model-request issue if they want that specifically.
+
+<a id="issue-509"></a>
+
+#### #509 — LayerNorm folding not implemented for BertBlock
+
+- **Issue**: BertBlock uses post-norm (LN after attention/MLP, not before). `fold_ln=True` still folds LN into Q/K/V which is mathematically incorrect for post-norm.
+- **HookedTransformer**: `BertBlock` lives at [HookedEncoder.py:24,51,53](transformer_lens/HookedEncoder.py). Neel's reply in the thread is decisive: *"LayerNorm should not be folded at all. You cannot fold it into W_O, because that would change the norm of the output of the layer and thus the LayerNorm scale. I can't think of any way to do LayerNorm folding for Bert, unfortunately"*. Architectural limitation.
+- **TransformerBridge**: bridge's `BertArchitectureAdapter` exists but `enable_compatibility_mode()` would inherit the same fold-doesn't-work issue. Most users don't fold LN on BERT regardless.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: this is a fundamental property of post-LN architectures, not a fixable bug. Resolution: either close as wontfix (Neel's view) or document in `enable_compatibility_mode()` that BertBlock should use `fold_ln=False`. The latter is ~5 LoC + a one-line warning when fold_ln=True is passed for BertBlock.
+
+<a id="issue-515"></a>
+
+#### #515 — `evals.IOIDataset` all entries identical
+
+- **Issue**: All entries in IOIDataset are the same. Cause: `random.seed(42)` at the top of `get_sample`.
+- **HookedTransformer/Bridge**: shared `transformer_lens/evals.py`. Confirmed at line 387: `def get_sample(self, symmetric=False): random.seed(42); template: str = random.choice(self.templates); ...` — re-seeding to 42 at the top of every sample gives identical samples.
+- **Replication**: `[empirically replicated]` — first 3 dataset entries have identical `prompt` tensors (verified with `torch.equal`).
+- **Bucket**: `bug-still-reproduces`
+- **Next step**: ~1 LoC fix — remove the `random.seed(42)` line at evals.py:387 (or move it outside the loop, but the comment in the file suggests it shouldn't be there at all). Trivial PR.
+
+<a id="issue-523"></a>
+
+#### #523 — Residual stack not adding up (logit lens)
+
+- **Issue**: User loads gpt2-small with `fold_ln=False`, expects `accumulated_resid[-1] @ W_U` to match logits, doesn't.
+- **HookedTransformer**: Neel's reply in the thread is the answer — user wasn't applying LN gain/bias before unembedding. Correct formula: `(final_residual_post_ln * model.ln_final.w + model.ln_final.b) @ model.W_U + model.b_U`.
+- **TransformerBridge**: same — applies to either API; the issue is conceptual.
+- **Replication**: `[code-verified]`
+- **Bucket**: `question-not-actionable`
+- **Next step**: close with link to Neel's reply. The conceptual gap could be addressed by adding an `apply_ln=True` example to the `accumulated_resid` docstring or the model_structure.md page (overlaps with #644 / hook semantics).
+
+<a id="issue-543"></a>
+
+#### #543 — Grokking demo broken in Colab
+
+- **Issue**: `loss_fn(all_logits, labels)` raises `RuntimeError: Size does not match at dimension 0 expected index [12769, 1] to be smaller than self [113, 113]`.
+- **HookedTransformer**: demo at [`demos/Grokking_Demo.ipynb`](demos/Grokking_Demo.ipynb). Recent commits include `58b007f8 Fix type of HookedTransformerConfig.device (#1230)`, `98811df5 3.0 CI Bugs (#1261)`, `69326dad Updating notebooks` — multiple post-issue updates.
+- **TransformerBridge**: not directly relevant — this is a demo-specific shape bug.
+- **Replication**: `[unverifiable on this machine]` — would need to actually run the full notebook in a Colab-like environment.
+- **Bucket**: `bug-likely-fixed-needs-verification`
+- **Next step**: anthonyduong9 commented "I can work on this today" but no PR linked. Ask reporter (or a contributor) to re-run the notebook on current `dev` and confirm the original error still occurs.
+
+<a id="issue-569"></a>
+
+#### #569 — Cannot load Llama 3 70B on multigpu in 4bit
+
+- **Issue**: `HookedTransformer.from_pretrained(..., hf_model=base_model)` fails with `size mismatch for blocks.0.attn._W_K: copying a param with shape torch.Size([4194304, 1]) from checkpoint, the shape in current model is torch.Size([8, 8192, 128])`. Multiple users report the same on Llama-3-8B and Llama-2-70B. BnB packs weights as 1D blobs; HT's QKV reshape doesn't handle this packing.
+- **HookedTransformer**: HT's loading path (`load_and_process_state_dict` + `convert_llama_weights`) doesn't unpack BnB-quantized weights before reshape.
+- **TransformerBridge**: bridge's load path (`sources/transformers.py`) accepts pre-loaded `hf_model` and skips state_dict conversion entirely — the BnB-pack shape-mismatch error in HT's `load_state_dict` cannot occur because that step doesn't run. Forward pass on bridge: bridge does its own attention reconstruction (see #615 / #720 entries) but the q/k/v projections go through `LinearBridge` wrapping HF's quantized Linear modules, so quantization is transparently honored at the projection step. Bridge therefore avoids the loading-time error.
+- **Replication**: `[unverifiable on this machine]` — needs ≥2 GPUs + BnB. Structurally: bridge skips the broken state_dict conversion path, so the specific `RuntimeError` the issue reports cannot manifest. Whether 4bit + multigpu works end-to-end on bridge with a 70B model has not been empirically tested (no machine here has the hardware).
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: HT-side fix would require BnB-aware QKV reshape (~50 LoC, architecture-specific). Bridge users can use `TransformerBridge.boot_transformers(model_name, hf_model=quantized_hf_model)` to skip the broken path. Caveat: end-to-end 4bit+multigpu on bridge is structurally sound but unverified empirically; first user to try it on hardware should confirm and we close on that confirmation.
+
+<a id="issue-588"></a>
+
+#### #588 — Setup unit tests to cover model configurations
+
+- **Issue**: Add unit tests that load every supported model's config and verify it's parseable.
+- **HookedTransformer/Bridge**: per-architecture config tests now exist for several models — `tests/unit/test_gemma3_config.py`, `tests/unit/test_hooked_transformer_config.py`, `tests/unit/test_llava_config.py`. Plus structural-mapping tests for ~15 architectures under `tests/unit/model_bridge/supported_architectures/`. Not all 185 models systematically covered, but the foundation exists.
+- **Replication**: `[code-verified]`
+- **Bucket**: `partial-leave-open`
+- **Next step**: parametrize a single test over the full `SUPPORTED_ARCHITECTURES` keys (~30 LoC) — for each, load config-only via `boot_transformers(model_name, load_weights=False)` and assert it succeeds. Curt-tigges originally signed up for this in 2024 but no PR linked.
+
+<a id="issue-595"></a>
+
+#### #595 — Add Stopping Criteria support
+
+- **Issue**: HF offers `StoppingCriteria` class that can halt generation on custom conditions (regex match, max length per beam, etc.). HT's `generate()` only supports `stop_at_eos`.
+- **HookedTransformer**: confirmed — [HookedTransformer.py:1882,1918,2069](transformer_lens/HookedTransformer.py#L1882) only `stop_at_eos: bool` parameter; no callable-based stopping.
+- **TransformerBridge**: bridge `generate` at [bridge.py:2371](transformer_lens/model_bridge/bridge.py#L2371) — same; only `stop_at_eos`.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~30 LoC — add `stopping_criteria: Optional[Callable[[tokens, logits], bool]] = None` parameter to both `HookedTransformer.generate` and `TransformerBridge.generate`, evaluate after each sampled token, break if any returns True. srishti-git1110 volunteered in 2024.
+
+<a id="issue-615"></a>
+
+#### #615 — HookedTransformer output not identical to HuggingFace for Llama 3
+
+- **Issue**: Greedy decoding diverges between HT and HF on Llama-3-8B-Instruct. Investigation in thread localized to MLP weight differences after einsum/Linear conversion.
+- **HookedTransformer**: substantial post-issue cleanup happened — most einsum calls in attention/MLP replaced with `F.linear` (visible at [abstract_attention.py:368,374](transformer_lens/components/abstract_attention.py#L368)). Latest collaborator update (degenfabian) reports max diff `0.0002` on Llama-3-8B-Instruct after einsum removals — likely close enough for production, but per-user reports continue (Gemma 2-2B, etc).
+- **TransformerBridge**: **NOT a passthrough to HF's attention** — `JointQKVAttentionBridge._reconstruct_attention` and `PositionEmbeddingsAttentionBridge.forward` both do their own attention math: `torch.matmul(q, k.transpose(-2,-1)) * scale`, then their own softmax+mask, then `torch.matmul(weights, v)`. They wrap HF's `q_proj`/`k_proj`/`v_proj` weights via `LinearBridge`, but the score computation, mask application, and output reshape are bridge code. Empirically (Pythia-70m fp32, see #385): bridge max diff vs HF is `2.56e-3` (mean `2.78e-4`, argmax matches across all positions). Materially better than HT's NaN, but not bit-exact.
+- **Replication**: `[empirically replicated]` — Pythia-70m bridge vs HF gives 2.5e-3 max drift, argmax matches.
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: bridge users get argmax/CE/generation parity with HF (the user-visible behavior the issue actually reports). Bit-exact match isn't achieved — bridge's attention reconstruction has small drift, likely from one of: (a) bridge forces `attn_implementation="eager"` while HF default may pick sdpa, (b) softmax dtype/order differences, (c) intermediate `.contiguous()` calls. For most interpretability uses this is fine; for analyses requiring strict numerical identity (e.g., bit-exact circuit reproduction), bridge is closer than HT but not perfect.
+
+<a id="issue-644"></a>
+
+#### #644 — Documentation: Map the Act Names to the Transformer
+
+- **Issue**: Add a labeled diagram mapping hook names to positions on a transformer architecture figure (Vaswani-style).
+- **HookedTransformer/Bridge**: [`docs/source/content/model_structure.md`](docs/source/content/model_structure.md) is 153 lines listing 51 hook names with descriptions, but no diagram. Two volunteers (juvogt, tjbai) said they'd contribute, no PR landed.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~1-day docs task — generate a diagram (matplotlib + manual layout, or Excalidraw + commit the SVG). Place at `docs/source/_static/hook_diagram.svg`, embed in `model_structure.md`. Overlaps with #657's `hook_normalized` semantics under fold_ln (good chance to add that note while editing the page).
+
+<a id="issue-661"></a>
+
+#### #661 — Pythia output inconsistent across batch sizes with `use_split_qkv_input=True`
+
+- **Issue**: `model(input[:2])[0]` and `model(input[:1])[0]` give different outputs when `use_split_qkv_input=True`.
+- **HookedTransformer**: `transformer_block.py:123,137` branch on `use_split_qkv_input`. Bug confirmed.
+- **TransformerBridge**: bridge has no `use_split_qkv_input` flag — feature doesn't exist on the bridge, so the bug doesn't apply, but bridge users can't replicate the workflow either.
+- **Replication**: `[empirically replicated]` — pythia-70m, the exact repro from the issue, gives `max diff: 1.14e-02` (over 10× the issue's 1e-3 tolerance).
+- **Bucket**: `bug-still-reproduces`
+- **Next step**: investigate why per-token splitting changes batch-vs-single output. Likely a stateful interaction in the LN1 path (related to #335 — LN1 firing 3× per forward). Fix is non-trivial; `use_split_qkv_input` is a research-only feature so priority is moderate.
+
+<a id="issue-684"></a>
+
+#### #684 — Expand quantization model support beyond Llama
+
+- **Issue**: HT raises `AssertionError: Quantization is only supported for Llama models` when loading a 4bit Mistral via `hf_model=`.
+- **HookedTransformer**: confirmed at [HookedTransformer.py:1341-1342](transformer_lens/HookedTransformer.py#L1341-L1342) — explicit hard-coded `"llama" not in model_name.lower()` assertion blocks any non-Llama 4bit model.
+- **TransformerBridge**: bridge's `boot_transformers(hf_model=...)` path has no architecture-specific assertion. Bridge does its own attention reconstruction but the q/k/v projections wrap HF's Linear modules (which are the BnB-quantized linears when quantization is active), so quantization passes through transparently at the projection step.
+- **Replication**: `[code-verified]` — the assertion is still in place at the cited HT line. Bridge load path was structurally verified (no architecture filter in `boot_transformers`); end-to-end forward of a 4bit-quantized Mistral via the bridge has not been tested on this machine (no GPU + BnB).
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: HT-side fix is to remove the assertion AND audit per-architecture state_dict load for BnB-packed weights (overlaps with #569 root cause). Bridge users can already pre-load via `AutoModelForCausalLM.from_pretrained(model, load_in_4bit=True)` and pass to `boot_transformers(model_name, hf_model=quantized_model)`. Caveat: bridge's manual attention reconstruction with BnB-quantized weights is structurally sound but unverified empirically; first user with hardware should confirm.
+
+<a id="issue-696"></a>
+
+#### #696 — About the cached layernorm scale factors
+
+- **Issue**: Conceptual question about why `apply_ln_to_stack` uses cached scale factors instead of recomputing LN per-component.
+- **HookedTransformer/Bridge**: `apply_ln_to_stack` at [ActivationCache.py:987](transformer_lens/ActivationCache.py#L987). Neel answered in thread: cached scale factors are needed because we want to apply the FINAL residual's LN to PARTIAL components, and you can't infer the final norm from partial components.
+- **Replication**: `[code-verified]`
+- **Bucket**: `question-not-actionable`
+- **Next step**: close with link to Neel's answer. Could add a 2-line note to the docstring clarifying the design rationale (overlaps with #523 in spirit — both are LN-application confusion).
+
+<a id="issue-697"></a>
+
+#### #697 — Activation cache during generate
+
+- **Issue**: User wants `run_with_cache` semantics during `model.generate()` — cache activations of generated tokens, not just the prompt.
+- **HookedTransformer**: confirmed via [HookedTransformer.py:1873,2255](transformer_lens/HookedTransformer.py#L1873) — `generate()` and `generate_stream()` exist but neither integrates `run_with_cache`. bryce's reply: "no integration ... pretty low priority."
+- **TransformerBridge**: bridge's `generate` at [bridge.py:2371](transformer_lens/model_bridge/bridge.py#L2371) — same gap.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~50 LoC enhancement — wrap the per-token forward in `run_with_cache`'s hook-installation context, accumulate cache across iterations. Trickier than naive due to KV-cache interactions; needs care to avoid duplicate hook fires when cache grows. Both APIs need the same fix.
+
+<a id="issue-704"></a>
+
+#### #704 — Add support for TracrBench
+
+- **Issue**: TracrBench is a 121-model dataset of toy Tracr transformers for sanity-checking interp methods. Reporter has all models on HuggingFace; asks whether they should live in TransformerLens or a separate repo.
+- **HookedTransformer/Bridge**: not present in either side.
+- **Replication**: `[code-verified]` — `grep -i tracr_bench` returns nothing in `transformer_lens/`.
+- **Bucket**: `not-relevant-close`
+- **Next step**: Neel's reply was decisive: *"My personal inclination would be to just make this into another repo that builds on TransformerLens."* TracrBench should live in its own repo (HannesThurnherr's) with TL as a dependency. Close with a docs pointer to the external project, possibly link from `docs/source/content/gallery.md`.
+
+<a id="issue-710"></a>
+
+#### #710 — MVP Support For 1-2 Models Per-Modality
+
+- **Issue**: Add basic support for non-text models — TTS (Whisper), Vision (ResNet, ViT), Music Generation, etc. — to avoid scattered tooling.
+- **HookedTransformer**: not designed for non-text architectures.
+- **TransformerBridge**: partial coverage exists for some — `HubertForCTC` / `HubertModel` (audio) registered; LLaVA / LLaVA-Next / LLaVA-Onevision / Gemma3-Multimodal (vision-language) supported; CLIPVisionEncoderBridge as a sub-component. But no Whisper, no ResNet, no diffusion, no music gen. bryce's thread response argues for a "platform" approach (programmatical hook points, plugin architecture) rather than baking each modality into core TL.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: per bryce, the long-term fix is a plugin/extension architecture rather than per-model adapters. For now, vision-language is partially covered via LLaVA family; pure-vision (ViT, ResNet) needs new adapters. Worth treating this as an umbrella tracking issue and splitting into sub-issues per modality.
+
+<a id="issue-720"></a>
+
+#### #720 — Review current matmul function usages
+
+- **Issue**: `batch_addmm` is the right shape for GPT-2's `Conv1D`-style layers but inappropriate for Pythia/Llama which use plain `nn.Linear`. Need per-architecture matmul routing.
+- **HookedTransformer**: progress made — `transformer_lens/components/abstract_attention.py:368-374` now uses `F.linear` for the post-attention output (comment: "F.linear is a fused matmul+bias that matches HuggingFace exactly"). `batch_addmm` still in `utilities/addmm.py:22`. Full audit not done.
+- **TransformerBridge**: bridge does NOT delegate the full attention computation to HF — `JointQKVAttentionBridge._reconstruct_attention` and `PositionEmbeddingsAttentionBridge` both contain their own `torch.matmul(query_states, key_states.transpose(-2,-1))` calls for attention scores, and `torch.matmul(attn_weights, value_states)` for the output. The Q/K/V projections themselves go through `LinearBridge` which wraps HF's Linear (so projection matmul = HF's matmul = correct), but the attention-score and output-application matmuls are bridge code. The same audit concern (does our matmul match HF's, does it preserve precision under different dtypes, etc.) applies to those bridge calls.
+- **Replication**: `[code-verified]`
+- **Bucket**: `partial-leave-open`
+- **Next step**: ~3 distinct audit needs — (1) HT's `batch_addmm` vs `F.linear` per-architecture routing, (2) bridge's `torch.matmul(q, k.T)` and `torch.matmul(weights, v)` vs HF's per-architecture attention impl (e.g., HF's `LlamaAttention.forward` may upcast or use different matmul variants under specific configs), (3) Q/K/V projections (already correct via HF Linear on bridge; per-architecture on HT). Bridge users get correct projection matmuls automatically but inherit the bridge's own attention-math audit gap. Recommend bridge migration as the strategic answer for projection-related precision issues; the bridge-side attention audit is its own future work.
+
+---
+
+### Batch 3: next 20 issues (#729 – #888)
+
+_(complete — sign-off requested before batch 4)_
+
+<a id="issue-729"></a>
+
+#### #729 — Guide to adding new models
+
+- **Issue**: Add a how-to-extend-TL guide for users adding new model support.
+- **HookedTransformer/Bridge**: `docs/source/content/` has `migrating_to_v3.md`, `model_structure.md`, `getting_started.md`, etc. — but no dedicated "add a new architecture" guide. The 3.0 bridge architecture made this easier (write an `ArchitectureAdapter` subclass), but there's no walkthrough doc.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~1-day docs task. Walk through what an `ArchitectureAdapter` does (`supported_architectures/llama.py` is a good template), document the `component_mapping` API, demonstrate adding a new entry to `SUPPORTED_ARCHITECTURES`. Overlaps with #888 (CLIP language model adapter Q) and #710 (modality scope).
+
+<a id="issue-737"></a>
+
+#### #737 — Q reshape with model loaded in 4bit
+
+- **Issue**: `model.cfg.use_split_qkv_input = True` + 4bit-loaded vicuna-7b → `RuntimeError: shape '[1, 6, 32, 128]' is invalid for input of size 786432` in `AbstractAttention.calculate_qkv_matrices`. The 4bit code path passes the BnB-packed `[d*d_head*n_heads/2, 1]` weight to `bnb.matmul_4bit`, which dequantizes incorrectly with split QKV input.
+- **HookedTransformer**: confirmed at [abstract_attention.py:58-59,338,342,378,381,454,458,473](transformer_lens/components/abstract_attention.py#L58) — multiple `if self.cfg.load_in_4bit:` branches that build `Params4bit` shaped `[nq, 1]`. Path interacts poorly with `use_split_qkv_input=True`.
+- **TransformerBridge**: bridge has no `use_split_qkv_input` flag. Quantized models load via `boot_transformers(hf_model=quantized_model)` — bridge's manual attention reconstruction (see #615 / #720 entries) operates on HF's quantized Linear modules, so 4bit + standard hooks should work, but split QKV is a TL-specific feature not on the bridge.
+- **Replication**: `[unverifiable on this machine]` — needs GPU + bnb 4bit.
+- **Bucket**: `partial-leave-open`
+- **Next step**: bridge users avoid this specific bug since the feature isn't there. HT-side fix requires reshape-aware logic in `calculate_qkv_matrices` for the 4bit + split path (~30 LoC). User needs a workaround on HT — currently must disable `use_split_qkv_input` for 4bit models.
+
+<a id="issue-754"></a>
+
+#### #754 — Don't load from HF when config is passed in
+
+- **Issue**: User passes a locally-loaded `hf_model` to `HookedTransformer.from_pretrained(..., hf_model=local_model)` but TL still tries to download config from HuggingFace. Same root cause as #846 and #800.
+- **HookedTransformer**: confirmed at [loading_from_pretrained.py:160](transformer_lens/loading_from_pretrained.py#L160) — `convert_hf_model_config` calls `AutoConfig.from_pretrained(official_model_name)` unconditionally to determine architecture. The `hf_cfg` parameter to `get_pretrained_model_config` is only used at line 1847+ for a few specific overrides (`load_in_4bit`, `d_vocab`, `rotary_base`); architecture detection still requires HF reachability. Workaround: pass a local **path** as `model_name` (line 134 checks for local `config.json`).
+- **TransformerBridge**: bridge's `boot_transformers(hf_model=...)` takes the architecture from the passed model directly via [`sources/transformers.py:512`](transformer_lens/model_bridge/sources/transformers.py#L512) — `if hf_model is not None: pass`. No AutoConfig fetch needed. Offline use works.
+- **Replication**: `[code-verified]`
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: bridge users can boot offline with `hf_model=`. HT-side fix is a small but careful edit — pass `hf_cfg.architectures[0]` to `convert_hf_model_config` when available, skip the AutoConfig fetch. ~10 LoC. Same fix closes #800 and #846. Saberlve's monkey-patch in the thread is the right shape.
+
+<a id="issue-773"></a>
+
+#### #773 — TransformerLens on models with different layernorm placement (BioGPT)
+
+- **Issue**: BioGPT has only ONE layernorm per layer (post-MLP), unlike GPT-2's pre-LN1+pre-LN2 pattern. User asks if TL can adopt to this.
+- **HookedTransformer**: `BioGPT` not in supported models. `transformer_block.py` assumes the standard GPT-2 LN placement. Bryce's reply: this is "not possible without making modifications to the code itself" — would need an experimental branch for the user.
+- **TransformerBridge**: bridge's `BlockBridge` and architecture adapter pattern theoretically supports per-architecture LN placement (custom adapter could declare `ln1=None, ln2=NormalizationBridge(...)` etc.), but no BioGPT adapter exists. Bridge offers a structural hook for this; nobody's taken it.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: write a `BioGptArchitectureAdapter` for the bridge using the existing component pattern. ~80 LoC + tests. The bridge's component-map approach is exactly the right primitive; HT's hard-coded LN placement is the legacy-side problem.
+
+<a id="issue-778"></a>
+
+#### #778 — Gemma2 global/local attn order wrong
+
+- **Issue**: TL configures Gemma2 attention as `[global, local, global, local, ...]` but HF Gemma2 actually uses `[local, global, local, global, ...]` (verified via the HF source at `modeling_gemma2.py`). Sliding-window placement is inverted.
+- **HookedTransformer**: confirmed at [loading_from_pretrained.py:972,999,1027](transformer_lens/loading_from_pretrained.py#L972) — multiple Gemma2 configs hardcode `"attn_types": ["global", "local"] * 21`.
+- **TransformerBridge**: bridge uses HF's Gemma2 attention modules directly; HF's `attention_type = config.layer_types[layer_idx]` reads from HF config which has the correct order (`['sliding_attention', 'full_attention', ...]`).
+- **Replication**: `[empirically replicated]` — HF: `['sliding_attention', 'full_attention', 'sliding_attention', 'full_attention', 'sliding_attention']` (i.e., `[local, global, local, global, local, ...]`). TL: `['global', 'local', 'global', 'local', 'global', 'local']`. Inversion confirmed.
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: HT-side fix is to flip the order in the loading config (~5 lines per Gemma2 variant — multiple variants need updating). Bridge users get the correct order automatically.
+
+<a id="issue-784"></a>
+
+#### #784 — How to load a model in smaller precision (Gemma 2 OOM on 3060)
+
+- **Issue**: User runs `HookedTransformer.from_pretrained_no_processing("google/gemma-2-2b-it", dtype=torch.bfloat16)` on RTX 3060 (laptop) and gets CUDA OOM, while plain HF load works.
+- **HookedTransformer**: dtype handling exists; the OOM is likely from TL holding both an FP32 reference state_dict and an FP16/BF16 working copy during conversion (a known leak family — see #290). 3060 laptop typically has 6GB; gemma-2-2b in bf16 is ~5GB so very tight.
+- **TransformerBridge**: bridge wraps the HF model directly; no separate FP32-ref + working-copy duplication. Should fit on the 3060.
+- **Replication**: `[unverifiable on this machine]` — no GPU here. Code-level: bridge's loading at `sources/transformers.py:451` does `hf_model = model_class.from_pretrained(model_name, **model_kwargs)` then optional `.to(device)` — single allocation, dtype as requested. HT loads HF model first then re-allocates TL params, doubling peak.
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: bridge users can `boot_transformers("google/gemma-2-2b-it", dtype=torch.bfloat16, device="cuda")` and not double-allocate. HT-side fix is the broader memory-leak audit (#290 family).
+
+<a id="issue-796"></a>
+
+#### #796 — `FactoredMatrix.svd()` `lru_cache` prevents GC
+
+- **Issue**: `FactoredMatrix.svd` is decorated with `@lru_cache(maxsize=None)`, which holds references to instances and prevents garbage collection.
+- **HookedTransformer/Bridge**: confirmed at [FactoredMatrix.py:9,217-218](transformer_lens/FactoredMatrix.py#L217) — `from functools import lru_cache` and `@lru_cache(maxsize=None) def svd(self): ...`. Instance-method `lru_cache` creates a strong ref via `self`.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~10 LoC fix per the issue's suggestion — replace `@lru_cache(maxsize=None)` with `@cached_property` (functools, stdlib). `cached_property` stores result in instance `__dict__`, no cyclic ref, GC-safe. Same fix for `eigenvalues` (the issue notes both). Breaking change since `.svd()` becomes `.svd` (property, no parens) — version-bump worthy. Worth coordinating with the broader `FactoredMatrix` cleanup (#341 also touches this file).
+
+<a id="issue-798"></a>
+
+#### #798 — Remove `model_args` (use only `model_kwargs`)
+
+- **Issue**: Bryce's own proposal — clean up `model_args` + `model_kwargs` redundancy in functions that pass-through to other functions.
+- **HookedTransformer**: confirmed — `model_args` still present in [`HookedEncoderDecoder.py:489,495,501,513`](transformer_lens/HookedEncoderDecoder.py#L489) and `hook_points.py:629,723,737,779`. Both `*model_args` and `**model_kwargs` accepted as positional + keyword pairs.
+- **TransformerBridge**: bridge's hook_points share the same machinery. `bridge.run_with_cache` etc. inherit the same pattern.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: ~30 LoC across affected files; remove `*model_args`, keep only `**model_kwargs`. Breaking change for callers passing positional args, but Bryce filed it as a `breaking-change` already so acceptable.
+
+<a id="issue-800"></a>
+
+#### #800 — Load model fails (offline use, GPT2-xl local)
+
+- **Issue**: User has GPT2-xl downloaded locally; loading via `HookedTransformer.from_pretrained` works in one Jupyter notebook but fails in another with "couldn't connect to HF" — environment-specific symptom of the deeper issue (#846 / #754) where TL fetches HF config unconditionally.
+- **HookedTransformer**: same root cause as #754 / #846 — `convert_hf_model_config` does `AutoConfig.from_pretrained(...)`. If env is set up for offline (HF_HUB_OFFLINE, cached), it may work; otherwise fails.
+- **TransformerBridge**: same fix as #754 — use `boot_transformers(hf_model=local_loaded_model)` or pass a local path to skip AutoConfig.
+- **Replication**: `[code-verified]`
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: same as #754/#846 — HT-side fix to read architecture from `hf_cfg.architectures[0]` when available. Bridge users have the workaround today.
+
+<a id="issue-801"></a>
+
+#### #801 — Padding side inconsistency with HuggingFace (Gemma 2)
+
+- **Issue**: Reporter on TL 2.9.0 found `HookedTransformer.from_pretrained('google/gemma-2-2b').tokenizer.padding_side == 'right'` while HF AutoTokenizer reports `'left'`.
+- **HookedTransformer**: tested on current `dev` — both report `'left'`. Mismatch no longer reproduces. The fix likely came in via tokenizer-handling refactor between 2.9.0 and current.
+- **TransformerBridge**: bridge inherits HF tokenizer settings directly; no override. Reports `'left'`.
+- **Replication**: `[empirically not reproduced]` — current `dev`: TL `'left'`, HF `'left'`, no mismatch.
+- **Bucket**: `bug-likely-fixed-needs-verification`
+- **Next step**: comment on issue asking reporter to retest on current `dev`. If they confirm, close.
+
+<a id="issue-830"></a>
+
+#### #830 — Type hint support for `self.model` in `ActivationCache`
+
+- **Issue**: `ActivationCache.model` is untyped (would need `HookedTransformer` import → circular). Proposes a `HookedTransformerMixin` to break the cycle.
+- **HookedTransformer**: confirmed at [ActivationCache.py:118](transformer_lens/ActivationCache.py#L118) — `self.model = model` with no type annotation. Bryce in the thread: 4.0 work, possibly 3.0 if non-disruptive. Milestone tag: 3.0.
+- **TransformerBridge**: bridge uses the same `ActivationCache`; same untyped attribute.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: extract a `HookedRootModuleMixin` (or use `TYPE_CHECKING + Protocol`) to provide type hints without circular imports. ~50 LoC + careful refactor. Tagged 3.0 milestone but not done.
+
+<a id="issue-837"></a>
+
+#### #837 — Multi-GPU device ordinal issue (`n_devices=3` for llama2-7b)
+
+- **Issue**: With `n_devices=3`, `get_device_for_block_index` produces device indices that exceed the available range, throwing "device ordinal out of range." Same root cause family as #907, #911, #968.
+- **HookedTransformer**: bug still in [multi_gpu.py:142](transformer_lens/utilities/multi_gpu.py#L142) — `device_index = (device.index or 0) + (index // layers_per_device)` overshoots when `n_layers % n_devices != 0` (32 layers / 3 = 10.67 → blocks 30,31 land on device 3 which doesn't exist).
+- **TransformerBridge**: pre-loaded `hf_model=` with HF's `device_map="auto"` works on `dev`. The unmerged feature/multi-device-bridge PR #1270 provides first-class `n_devices=N` and `device_map=...` parameters that delegate to accelerate.
+- **Replication**: `[unverifiable on this machine]` — no multi-GPU.
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: HT-side fix is `device_index = (index // layers_per_device)` clamped to `n_devices - 1`. Bridge users today can use `hf_model=accelerate_dispatched_model` workaround; #1270 makes it first-class.
+
+<a id="issue-846"></a>
+
+#### #846 — Prioritize local `hf_model.config` for Qwen models
+
+- **Issue**: Loading a local Qwen via `HookedTransformer.from_pretrained_no_processing(model_name="Qwen/...", hf_model=local, tokenizer=tok)` still fetches HF config online, fails offline.
+- **HookedTransformer**: same root cause as #754 / #800 — `convert_hf_model_config` at [loading_from_pretrained.py:160](transformer_lens/loading_from_pretrained.py#L160) calls `AutoConfig.from_pretrained` unconditionally. `hf_cfg` only used for a few overrides at line 1847+, not architecture detection. kapedalex (contributor) commented "can not reproduce today" — likely they tested with a model that hits a name-based shortcut (Llama/Gemma branches at lines 145-157) that skips the AutoConfig fetch. Qwen has no such shortcut.
+- **TransformerBridge**: bridge's `boot_transformers(hf_model=...)` skips AutoConfig entirely.
+- **Replication**: `[code-verified]`
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: HT-side fix per #754. Bridge users have the workaround today via `boot_transformers(hf_model=...)`.
+
+<a id="issue-858"></a>
+
+#### #858 — Loading gemma-7b-it runs out of memory (2× H100)
+
+- **Issue**: `HookedTransformer.from_pretrained_no_processing("google/gemma-7b-it", n_devices=2)` on 2× H100 fails with OOM. Bryce: multi-GPU has known issues; suggested retry after planned overhaul.
+- **HookedTransformer**: gemma-7b in bf16 is ~14GB; 2× H100 (80GB each) should easily fit. OOM during loading suggests TL holds both source state_dict and target params concurrently (memory-leak family — #290). Plus the multi-GPU device-distribution bugs (#837 family) compound this.
+- **TransformerBridge**: bridge has no FP32-ref + working-copy duplication. Pre-loaded `hf_model=` with `device_map="auto"` should fit easily.
+- **Replication**: `[unverifiable on this machine]` — no GPU here.
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: bridge migration recipe (`boot_transformers("google/gemma-7b-it", hf_model=AutoModel.from_pretrained(model, device_map="auto"))`). HT-side fix overlaps #290 (memory leak) + #837 (multi-GPU placement).
+
+<a id="issue-867"></a>
+
+#### #867 — Does TransformerLens support LVLM like Qwen2-VL?
+
+- **Issue**: User asks if Qwen2-VL / Qwen2.5-VL is supported.
+- **HookedTransformer**: no native VLM support.
+- **TransformerBridge**: LLaVA family supported (`LlavaArchitectureAdapter`, `LlavaNextArchitectureAdapter`, `LlavaOnevisionArchitectureAdapter`); Gemma3-Multimodal supported. **Qwen2-VL specifically is NOT in `SUPPORTED_ARCHITECTURES`** — `Qwen2VLForConditionalGeneration` / `Qwen2_5_VLForConditionalGeneration` not registered. Different vision tower + projector than LLaVA, so existing adapters don't transfer.
+- **Replication**: `[code-verified]` — grep confirms no Qwen2VL adapter.
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: add `Qwen2VLArchitectureAdapter`. Per bryce thread reply: the framework supports adding it now (vision support landed); ~150 LoC for an adapter following the LLaVA pattern. Could file as a focused model-request; ExplorerFreda's vlm-lens fork in the thread offers an alternative.
+
+<a id="issue-869"></a>
+
+#### #869 — Custom generative video transformer
+
+- **Issue**: User wants to do mech interp on a generative-video diffusion transformer (Sora-like, V2V).
+- **HookedTransformer/Bridge**: neither supports diffusion / video generation. Bryce's reply suggests a separate root module: "this sort of model is going to be so different from what we have."
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: this is genuinely architectural — would need a `HookedDiffusionTransformer` root module separate from `HookedTransformer`/`TransformerBridge`. Outside the current scope. Recommend using a dedicated diffusion-interp tool or building a custom hook layer; close as wontfix or leave as architectural roadmap item.
+
+<a id="issue-872"></a>
+
+#### #872 — Add official support for `device_map`
+
+- **Issue**: Bryce's own proposal — currently `device_map` is passed through to HF for loading but isn't a TL-supported parameter for distribution. notoookay's comment shows `n_devices=2` causes `RuntimeError: indices should be either on cpu or on the same device as the indexed tensor` on HookedTransformer with gemma-2-2b-it.
+- **HookedTransformer**: `n_devices` partially works via `move_model_modules_to_device` but has the placement bugs documented in #837/#907/#911/#968. notoookay's repro fails with the exact mid-forward device-mismatch error.
+- **TransformerBridge**: on `dev`, bridge accepts a pre-loaded `hf_model=` with HF's `device_map`. On unmerged `feature/multi-device-bridge` (PR #1270 — addresses this issue directly), bridge has first-class `n_devices=N` and `device_map="auto"` parameters that delegate to accelerate.
+- **Replication**: `[unverifiable on this machine]` for the multi-GPU repro; `[code-verified]` for the API surface.
+- **Bucket**: `partial-leave-open`
+- **Next step**: PR #1270 (currently `feature/multi-device-bridge`) brings first-class `device_map`/`n_devices` to the bridge — once merged, this issue closes for bridge users. HT-side `n_devices` rework is the multi-GPU bug cluster (#837 et al). Same remediation as #872's user-impact concern.
+
+<a id="issue-873"></a>
+
+#### #873 — Load Llama2-7b-chat-hf fail
+
+- **Issue**: User screenshots show a load failure for `Llama-2-7b-chat-hf`. Body has only screenshots, no specific error text.
+- **HookedTransformer**: `LlamaForCausalLM` adapter present; the model is in `supported_models.py`. Without the actual error text, hard to diagnose. Commenter sg-sy suggested specific kwargs (`n_devices`, `cache_dir`, `center_writing_weights=False`) as a workaround — implies a multi-GPU or weight-processing related issue.
+- **TransformerBridge**: same architecture supported; bridge wouldn't have HT's loading-side weight-processing concerns.
+- **Replication**: `[unverifiable on this machine]` — large model + ambiguous error.
+- **Bucket**: `bug-likely-fixed-needs-verification`
+- **Next step**: ask reporter for the actual error text + their TL version. Many Llama loading bugs (#385 rotary, #569 4bit shape, weight processing) have been fixed since this was filed.
+
+<a id="issue-878"></a>
+
+#### #878 — Layer-wise caching for low GPU memory (Qwen 7B Instruct)
+
+- **Issue**: User runs attribution patching on Qwen 7B Instruct on 2× A6000 (48GB each) and gets OOM despite trying layer-wise caching. Asks for help.
+- **HookedTransformer**: this is a usage question; OOM at attribution-patching scale typically requires gradient checkpointing or smaller batches/sequences.
+- **TransformerBridge**: same — neither API has built-in attribution-patching memory optimization.
+- **Replication**: `[unverifiable on this machine]`
+- **Bucket**: `question-not-actionable`
+- **Next step**: close with a docs/recipe pointer. Helpful response: gradient checkpointing via `torch.utils.checkpoint`, or per-layer hook-based caching that releases activations between layers. Recipe could fit in the "extending TL" doc that #729 calls for.
+
+<a id="issue-888"></a>
+
+#### #888 — Adapt HookedTransformer to a non-supported model (CLIP language model)
+
+- **Issue**: User wants `HookedTransformer.from_pretrained` for the CLIP language model component.
+- **HookedTransformer**: not possible without code modifications, per Bryce's reply.
+- **TransformerBridge**: bridge has `CLIPVisionEncoderBridge` for the vision side (used by LLaVA family) but no text-side CLIP adapter. The bridge's adapter framework is the right primitive — user could write a `CLIPTextModelArchitectureAdapter` for the text encoder.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: write `CLIPTextModelArchitectureAdapter` for the bridge. The architecture is encoder-only with relatively standard transformer blocks (BERT-like attention, no causal mask). Overlaps with #729 (extending guide) — having both would let the user self-serve. ~120 LoC adapter + tests.
+
+### Batch 4: next 20 issues (#894 – #1148)
+
+_(complete — sign-off requested before batch 5)_
+
+<a id="issue-894"></a>
+
+#### #894 — Implement LongRoPE
+
+- **Issue**: Microsoft's LongRoPE rope-scaling variant (used by Phi-4-mini and Phi-3.5-mini) requires per-segment frequency tables and short/long-factor selection based on sequence position. Without it, TL inference silently diverges from HF for long contexts.
+- **HookedTransformer**: `loading_from_pretrained.py:893-906` handles `rope_type == "llama3"` and `rope_type == "yarn"` only. No `longrope` branch. LongRoPE configs are not parsed; rotary computation falls back to standard RoPE.
+- **TransformerBridge**: bridge delegates rope computation to HF's `apply_rotary_pos_emb`, so LongRoPE on Phi-3.5/Phi-4-mini works natively. `phi3.py:238-240` strips `rope_scaling.rope_type == "default"` to avoid HF's strict mode rejecting the field, but LongRoPE-typed configs flow through to HF unchanged.
+- **Replication**: `[code-verified]`
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: comment on the issue pointing to bridge support for Phi-3.5-mini-instruct / Phi-4-mini-instruct via `boot_transformers`. HookedTransformer-side LongRoPE would still be a non-trivial implementation (per-position frequency selection); leave the HT path as a known gap and prioritize bridge-side verification (e.g. add Phi-4-mini to integration tests).
+
+<a id="issue-902"></a>
+
+#### #902 — Some model weights are NaN when initializing
+
+- **Issue**: User reports that `HookedTransformer(HookedTransformerConfig.from_dict(...))` produces NaN entries in specific weight tensors (e.g. `blocks.1.attn.W_O` with 1808 NaNs at row indices 10–11) when initialized from gpt2-small's config. Original repro on TL 2.15.0.
+- **HookedTransformer**: `init_weights` at [HookedTransformer.py:1483](transformer_lens/HookedTransformer.py#L1483) iterates `named_parameters()` and runs `nn.init.normal_(param, std=initializer_range)`. With normal init from a non-degenerate `initializer_range`, NaNs would only arise from uninitialized memory or device-init bugs. Possible regression in a now-fixed init path; current dev not yet verified for this specific repro.
+- **TransformerBridge**: bridge does not run TL's init paths — uses HF's native init via `from_pretrained`. Random-init via bridge (`load_weights=False`) goes through HF's `_init_weights`, not `_init_weights_gpt2`.
+- **Replication**: `[unverifiable on this machine]` — would need to install TL 2.15.0 and rerun; user did not retest on later versions.
+- **Bucket**: `bug-likely-fixed-needs-verification`
+- **Next step**: ask reporter to retest on current dev (TL 3.x). If reproduces, add a regression test that checks `state_dict()` for NaNs after `_init_weights_*` paths. Bridge users are unaffected.
+
+<a id="issue-903"></a>
+
+#### #903 — gpt2-small `n_params` reports 85M (actual 124M)
+
+- **Issue**: `model_properties_table.html` shows gpt2-small at 85M params; HF reports 124M. Reporter pinpoints `HookedTransformerConfig.n_params` calc which excludes embedding params (W_E ≈ 39M).
+- **HookedTransformer**: `config/HookedTransformerConfig.py:325-334` calculates `n_params = n_layers * (d_model * d_head * n_heads * 4) + MLP terms`. Embeddings, unembedding, biases, and LN params are all excluded. Same root cause as #448.
+- **TransformerBridge**: shares the same `HookedTransformerConfig` via `cfg`, so same calculation.
+- **Replication**: `[code-verified]` — calculation explicitly excludes embeddings; gpt2-small W_E = 50257 × 768 ≈ 38.6M matches the gap.
+- **Bucket**: `bug-still-reproduces`
+- **Next step**: same fix as #448. Either add an embedding term and rename to `n_attn_mlp_params` (+ new `n_total_params`), or change docs to clarify what `n_params` measures. Behavior change has model-properties-table downstream — coordinate with #97.
+
+<a id="issue-904"></a>
+
+#### #904 — Gemma tensors initialized on CPU during state-dict conversion
+
+- **Issue**: When passing a CUDA-loaded `hf_model` to `HookedTransformer.from_pretrained` with Gemma-2-2b, `fold_value_biases` raises a device-mix error because some state-dict tensors are on CPU.
+- **HookedTransformer**: at [HookedTransformer.py:1875](transformer_lens/HookedTransformer.py#L1875), `fold_value_biases` does `b_O_original + (b_V[:, :, None] * W_O).sum([0, 1])` without an explicit `.to(device)`. If `convert_gemma_weights` returns a state_dict with mixed device placement (e.g. biases default-initialized on CPU when source had no biases), this fails. ZeqiangWangSurrey reports same problem on Qwen.
+- **TransformerBridge**: bridge does not run `fold_value_biases` by default. With `enable_compatibility_mode(fold_value_biases=True)`, it goes through bridge's own folding path which inherits HF's device placement directly. Pre-loaded `hf_model` retains its `device_map`; no CPU/GPU mix from converter.
+- **Replication**: `[unverifiable on this machine]` — would need a CUDA Gemma-2-2b load.
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: HT-side fix is one line — `b_V.to(W_O.device)` before the multiply, or move the whole operation through `_to_device` in `convert_gemma_weights`. joaoncardoso offered a PR. Bridge users sidestep by skipping fold_value_biases or by relying on bridge's HF-native device handling.
+
+<a id="issue-907"></a>
+
+#### #907 — PR #864 device-selection refactor breaks multi-GPU
+
+- **Issue**: PR #864 introduced greedy memory-based device allocation, replacing the previous architecture-aware sequential placement. Reporter claims `test_device_separation_and_cache` now fails. Also linked to #906 (loading on a specific device).
+- **HookedTransformer**: `move_model_modules_to_device` does memory-greedy placement that can scatter sequential blocks across devices, defeating the locality optimizations relevant for transformer forward passes. Same multi-GPU bug cluster as #837/#911/#968.
+- **TransformerBridge**: bridge does not use `move_model_modules_to_device`. On `dev`, accepts a pre-loaded `hf_model` with HF's `device_map="auto"` (proper architecture-aware placement via accelerate). PR #1270 (`feature/multi-device-bridge`) adds first-class `n_devices`/`device_map` parameters that delegate to accelerate.
+- **Replication**: `[unverifiable on this machine]`
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: HT-side fix would require reverting or guarding the greedy-allocation path. Bridge users get correct placement via accelerate today (manual `hf_model=`) and first-class once #1270 merges. Comment on issue with bridge migration recipe.
+
+<a id="issue-909"></a>
+
+#### #909 — Request for documentation of hookpoints
+
+- **Issue**: User finds it hard to map hookpoints to specific transformer architecture components; asks for a documentation enhancement explaining each hookpoint and its correspondence to architecture parts.
+- **HookedTransformer**: `docs/source/content/model_structure.md` documents legacy hookpoint names (`blocks.{i}.hook_resid_pre`, `hook_attn_out`, etc.) with shapes and meaning. The doc explicitly lists legacy aliases alongside canonical bridge names, so HT users get coverage too.
+- **TransformerBridge**: same doc — canonical `hook_in`/`hook_out` convention with shapes.
+- **Replication**: `[code-verified]` — doc exists and is comprehensive (160 lines covering embed, residual, attention, MLP, norm, unembed; legacy aliases mapped; shapes listed).
+- **Bucket**: `covered-close`
+- **Next step**: comment with link to `model_structure.md` and close. If #644's diagram lands too, that further closes the gap for visual learners.
+
+<a id="issue-911"></a>
+
+#### #911 — PosEmbed device error with `accelerate`
+
+- **Issue**: gpt2 + `accelerate launch` (DDP across 2 GPUs) fails inside `PosEmbed.forward` because `W_pos[offset_position_ids]` indexes a tensor that ends up on a different device than the index tensor.
+- **HookedTransformer**: `components/embeddings/pos_embed.py:47,59` does `pos_embed = self.W_pos[offset_position_ids]`. Under DDP, `accelerate` broadcasts the model to each rank but TL's per-component device placement (from `move_model_modules_to_device`) doesn't track `accelerate`'s rank-local device, so index/data device-mismatch occurs.
+- **TransformerBridge**: bridge does not have a `PosEmbed` component on most architectures — position embeddings are HF-native (RoPE inside attention, or HF's `Embedding` for absolute pos). For gpt2 specifically, the bridge uses HF's `GPT2Model.wpe` directly through `EmbeddingBridge`, which respects HF's device_map. No TL-specific cross-device indexing.
+- **Replication**: `[unverifiable on this machine]` — needs DDP setup.
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: comment with bridge migration recipe (`TransformerBridge.boot_transformers("gpt2")` plus `accelerate.prepare(bridge.original_model)` for training). HT-side fix would require all embed components to track DDP-rank device — same family as the multi-GPU cluster (#837/#907/#968).
+
+<a id="issue-912"></a>
+
+#### #912 — Support mT5 models
+
+- **Issue**: User requests `google/mt5-small` support for multilingual circuit discovery (Indonesian, Malay, Javanese). T5 is supported; mT5 has the same architecture.
+- **HookedTransformer**: not supported. T5-only path; no mT5 conversion logic.
+- **TransformerBridge**: `MT5ForConditionalGeneration` is in `utilities/architectures.py:12` (SUPPORTED_ARCHITECTURES) but the `model_type_mappings` in `model_bridge/sources/transformers.py:234` only maps `"t5"`, not `"mt5"`. mT5 reports `model_type="mt5"` in its config, so dispatch falls through and may fail to find the adapter. The T5ArchitectureAdapter itself would likely work for mT5 since the architecture is identical, but the routing isn't wired up.
+- **Replication**: `[code-verified]` — model_type mapping confirmed missing.
+- **Bucket**: `partial-leave-open`
+- **Next step**: add `"mt5": "MT5ForConditionalGeneration"` to `model_type_mappings` and route `MT5ForConditionalGeneration` to `T5ArchitectureAdapter`. ~5 LoC + an integration smoke test on `google/mt5-small`. Once landed, comment on issue confirming. HT side would need the full T5 conversion path extended; recommend bridge migration instead.
+
+<a id="issue-923"></a>
+
+#### #923 — Pythia missing `blocks.0.hook_resid_mid`
+
+- **Issue**: User runs a cache-name assertion test on Pythia and finds no `blocks.0.hook_resid_mid`. Asks if it's a bug or alternative cache name.
+- **HookedTransformer**: by-design. Pythia uses parallel attention + MLP (`parallel_attn_mlp=True`), so there is no "mid residual" — attn and MLP both read from `hook_resid_pre` and write directly into `hook_resid_post`. kapedalex confirmed this in the comment thread.
+- **TransformerBridge**: same — bridge maps Pythia's GPT-NeoX architecture to a parallel block layout, no mid hook generated.
+- **Replication**: `[code-verified]` — `cfg.parallel_attn_mlp` flag determines whether `hook_resid_mid` is registered.
+- **Bucket**: `not-relevant-close`
+- **Next step**: close with the answer kapedalex already gave. Could optionally add a friendlier error (or auto-skip mid-resid cache assertions) when the model is parallel-attn — minor UX improvement.
+
+<a id="issue-929"></a>
+
+#### #929 — Load custom small GPT-2 with hf_model and HF config
+
+- **Issue**: User trained a small GPT-2 architecture model and wants to use it with HookedTransformer. Currently does it by overwriting `HookedTransformerConfig` after `get_pretrained_model_config`. Asks for a clean API.
+- **HookedTransformer**: `convert_hf_model_config` calls `AutoConfig.from_pretrained` unconditionally, ignoring the user's `hf_model.config`. Same root cause as #754/#800/#846. The user's "hacky way" is the documented workaround.
+- **TransformerBridge**: `boot_transformers(hf_model=user_model)` reads config from the user's pre-loaded model directly. No AutoConfig refetch. This is exactly the use case bridge solves.
+- **Replication**: `[code-verified]` — same code path as #754 cluster.
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: comment with bridge migration recipe — `TransformerBridge.boot_transformers("gpt2", hf_model=user_model, tokenizer=user_tokenizer)`. Close once user confirms or after a reasonable wait.
+
+<a id="issue-930"></a>
+
+#### #930 — Quantized Llama 3.2 fails to load
+
+- **Issue**: `meta-llama/Llama-3.2-3B-Instruct` with `BitsAndBytesConfig(load_in_4bit=True)` fails state_dict load — `_W_K`/`_W_V` shapes `[1572864, 1]` (BnB-packed 4bit) don't match expected `[8, 3072, 128]` (TL's 3D layout).
+- **HookedTransformer**: same root cause as #569 — TL's state_dict reshape assumes unpacked weights, but BnB stores `Params4bit` as packed 1D tensors. The reshape `view(n_kv_heads, d_model, d_head)` fails on packed shapes.
+- **TransformerBridge**: bridge does not reshape state_dict from HF format. Q/K/V stay in HF's `Params4bit` form inside `LinearBridge`, and `JointQKVAttentionBridge` reads them via the HF Linear's forward (which handles BnB's `bnb.matmul_4bit` natively). Structurally sound; not yet end-to-end verified on quantized Llama 3.2.
+- **Replication**: `[unverifiable on this machine]`
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: comment with bridge migration recipe. End-to-end quantized verification depends on hardware availability — same gap as #569/#684.
+
+<a id="issue-950"></a>
+
+#### #950 — Support SimpleStories models
+
+- **Issue**: SimpleStories family (HF: `SimpleStories/*`) is an improved successor to TinyStories — useful as small interp targets and as low-resource debugging models.
+- **HookedTransformer**: not supported. No `simplestories` registry entries; no architecture mapping.
+- **TransformerBridge**: SimpleStories fine-tunes (e.g. `SimpleStories/SimpleStories-1.25M`, `SimpleStories-35M`) are in `supported_models.json` via auto-discovery. Base SimpleStories model not yet registered — jlarson4's comment notes this.
+- **Replication**: `[code-verified]` — registry contains SimpleStories fine-tunes; base model absent.
+- **Bucket**: `partial-leave-open`
+- **Next step**: add base SimpleStories to bridge's supported_models registry and verify with a smoke test. jlarson4 already volunteered to tackle before next release. HT-side support is unlikely to land — direct users to bridge.
+
+<a id="issue-953"></a>
+
+#### #953 — Add basic support for Gemma 3n (E2B & E4B)
+
+- **Issue**: Gemma 3n introduces nested sub-models (Matryoshka E2B inside E4B), AltUp sparse updates, LAuReL low-rank residuals, Per-Layer Embeddings (PLE) with CPU offload, and mixed local/global attention. Reporter asks for text-only support that bypasses vision/audio.
+- **HookedTransformer**: not supported. Architecture is too divergent from any existing HT path.
+- **TransformerBridge**: not registered in `SUPPORTED_ARCHITECTURES`. Bryce confirmed in-progress for the next major TransformerLens release. Bridge can structurally accommodate the text-decoder portion if HF's `Gemma3nForCausalLM` exposes blocks in a familiar shape, but AltUp/LAuReL/PLE require dedicated component bridges.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-difficult`
+- **Next step**: track for milestone 3.x. Per-block AltUp/LAuReL adapters + PLE handling are non-trivial — likely 200-500 LoC of adapter code plus testing. Mixed local/global attention pattern overlaps with Gemma2 work (#778). Defer until HF's Gemma3n forward is stable.
+
+<a id="issue-962"></a>
+
+#### #962 — Can multiple GPUs be used?
+
+- **Issue**: User asks if `HookedTransformer.from_pretrained("Meta-Llama-3-8B-Instruct", device="auto")` works with `CUDA_VISIBLE_DEVICES=0,1`. On TL 2.11.0 it does not.
+- **HookedTransformer**: `n_devices` parameter exists, but `device="auto"` is not a TL convention (it's HF's). User would need `n_devices=2`. Multi-GPU placement has the bug cluster (#837/#907/#911/#968).
+- **TransformerBridge**: on `dev`, accepts `hf_model=` with HF's `device_map="auto"`. PR #1270 adds first-class `n_devices` / `device_map` parameters.
+- **Replication**: `[code-verified]`
+- **Bucket**: `question-not-actionable`
+- **Next step**: comment with both options — for HT, use `n_devices=N`; for bridge today, pass a pre-loaded `hf_model` with `device_map="auto"`; once #1270 merges, use bridge's first-class `device_map`. Close after answer.
+
+<a id="issue-968"></a>
+
+#### #968 — `unsloth/llama-3.2-3b-instruct` with 2× 3060 device-mismatch
+
+- **Issue**: `from_pretrained(..., n_devices=2)` on 2× 3060 throws `RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cuda:1)`. Same multi-GPU bug cluster as #837/#907/#911.
+- **HookedTransformer**: `move_model_modules_to_device` placement bug — embed/pos_embed indexed by tokens on rank-0 device while the embedding parameter ends up on rank-1.
+- **TransformerBridge**: jlarson4 already commented offering bridge as the path forward. PR #1270 brings first-class multi-device.
+- **Replication**: `[unverifiable on this machine]`
+- **Bucket**: `bug-likely-fixed-needs-verification`
+- **Next step**: jlarson4's comment already points to bridge + #1270. Wait for reporter to retest on bridge / post-#1270, then close. HT-side fix is the multi-GPU rework.
+
+<a id="issue-993"></a>
+
+#### #993 — Load compressed Llama/Qwen via HookedTransformer
+
+- **Issue**: User loads `meta-llama/Llama-2-7b-chat-hf` fine but cannot load compressed (quantized/pruned) variants of the same architecture.
+- **HookedTransformer**: hard-coded "Llama only" assertion in quantization path (same as #684). Pruned variants (sub-set of weights) would also fail TL's strict reshape since per-layer dims must match the registered config.
+- **TransformerBridge**: structurally accepts any compressed variant that HF can load. No "Llama only" assertion. Pruned variants with sub-set state dicts work because bridge holds HF parameters by reference, not via reshape. End-to-end verification on a specific compressed checkpoint not yet done.
+- **Replication**: `[code-verified]`
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: comment with bridge recipe. Verification on a known compressed checkpoint (e.g. `TheBloke/Llama-2-7B-Chat-GPTQ`) would confirm and allow closing.
+
+<a id="issue-1039"></a>
+
+#### #1039 — Loading models from local files in HookedTransformer
+
+- **Issue**: User gets `LocalEntryNotFoundError`/`OSError` from HookedTransformer when trying to load a local model offline. HF `AutoModelForCausalLM` works fine for the same path.
+- **HookedTransformer**: same root cause as #754/#800 — `convert_hf_model_config` calls `AutoConfig.from_pretrained` unconditionally, which tries to hit the Hub even when only a local path is provided.
+- **TransformerBridge**: `boot_transformers(hf_model=...)` reads everything from the pre-loaded HF model. No Hub access required.
+- **Replication**: `[code-verified]`
+- **Bucket**: `fixed-on-transformerbridge`
+- **Next step**: comment with bridge recipe — `TransformerBridge.boot_transformers(local_path, hf_model=hf_model, tokenizer=tokenizer)`. Close as duplicate of the #754 cluster after acknowledgment.
+
+<a id="issue-1080"></a>
+
+#### #1080 — Import fails by default in Colab (numpy ABI mismatch)
+
+- **Issue**: Fresh Colab notebook + `pip install transformer_lens` + `import transformer_lens` raises `numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject`. jakobhansen-blai notes a kernel restart works around it; suggests TL pin numpy v1 implicitly.
+- **HookedTransformer**: `pyproject.toml:12-13` has `numpy>=1.24` (py3.10/3.11) and `numpy>=1.26` (py3.12) — both lower bounds, no upper cap. Numpy 2.x is allowed by TL itself. The error is a transitive dep ABI mismatch (one of TL's dependencies built against numpy 1.x, runtime imported numpy 2.x or vice versa).
+- **TransformerBridge**: same install path; same numpy.
+- **Replication**: `[unverifiable on this machine]` — Colab-specific package versions.
+- **Bucket**: `bug-likely-fixed-needs-verification`
+- **Next step**: ask reporter to retest with a current Colab kernel (numpy 2.x default) and current TL. If it still fails, bisect transitive deps (likely `pandas`, `einops`, or `jaxtyping`). Pin a tested numpy if needed for Colab compat.
+
+<a id="issue-1133"></a>
+
+#### #1133 — `tokenize_and_concatenate` cuts tokens mid-document
+
+- **Issue**: `tokenize_and_concatenate` slices the joined corpus into 20 character-based chunks before tokenizing — this can split mid-token, producing token pairs that would never occur naturally.
+- **HookedTransformer**: PR #1273 (commit `ad8e123b`, "Improved Tokenize & Concatenate") replaces character-based chunking with per-document tokenization (`add_special_tokens=False`) and joins with token-level EOS. Tokens are no longer cut across chunk boundaries. Code comment at [tokenize_utils.py:68-70](transformer_lens/utilities/tokenize_utils.py#L68-L70) explicitly references #1133. Earlier PR #1201 (`4a5cc6f0`) also addressed the chunking issue partially.
+- **TransformerBridge**: same shared utility; same fix applies.
+- **Replication**: `[code-verified]` — current implementation tokenizes per-doc, concatenates with token-level EOS, then reshapes. No string-level chunking.
+- **Bucket**: `covered-close`
+- **Next step**: close as fixed (PR #1273). The original repro (`tokens[79848:79848+2] == [337, 346]`) cannot occur under the new implementation.
+
+<a id="issue-1148"></a>
+
+#### #1148 — Tutorial for "Real-Time Training Dynamics" (VSM Telemetry)
+
+- **Issue**: Reporter proposes a new demo notebook adding `VSMTelemetry` — a ~30-line bridge class that logs Attention Coherence (σ_p) and Head Specialization (σ_a) during training, useful for studying grokking / phase transitions.
+- **HookedTransformer**: no such tutorial exists. `demos/` has Grokking demo (#543, currently broken on Colab) but nothing on real-time mechanistic training telemetry.
+- **TransformerBridge**: same — bridge has no training-dynamics telemetry tutorial. Would work equivalently against bridge's hook system.
+- **Replication**: `[code-verified]`
+- **Bucket**: `not-addressed-simple`
+- **Next step**: invite contribution — reporter has a working prototype. Notebook should target `TransformerBridge` (per migration guide), use `run_with_cache` or hooks for σ_p/σ_a extraction, run on a tiny model so it executes quickly in Colab. Add to `demos/` with CI check that the notebook runs.
+
+### Batch 5: final triage entry (#1165)
+
+_(Issues #1263 and #1264 were opened by the maintainer for tracking and don't require triage.)_
+
+<a id="issue-1165"></a>
+
+#### #1165 — Strategy for high-fragmentation tokenization (Yoruba)
+
+- **Issue**: Yoruba tonal characters (e.g. `ọ` in `Atọwọda`) trigger extreme tokenizer fragmentation under GPT-2's BPE — 9 tokens for one word, mostly byte-level fallbacks. User asks whether TransformerLens offers a recommended heuristic for pooling activations across byte-token spans for activation patching, or if the standard practice is to ignore byte-level noise.
+- **HookedTransformer**: no built-in pooling utility for fragmented tokens. `to_tokens` is HF-tokenizer passthrough; cache shape is per-token. User would need to write their own span-mean / span-max reducer over the cache before patching.
+- **TransformerBridge**: same — bridge inherits the HF tokenizer and offers no span-pooling helper. Activation cache shape is identical.
+- **Replication**: `[code-verified]` — no `pool_token_span` / `aggregate_subword` utility exists in either codebase.
+- **Bucket**: `question-not-actionable`
+- **Next step**: close with a recipe-style answer pointing to two practical options: (1) use a tokenizer with better non-Latin coverage (e.g. mT5, NLLB, or a Yoruba-trained model) — which also overlaps with #912's mT5 request; (2) implement span-pooling manually with a token-to-word mapping built from the tokenizer's offset_mapping (`tokenizer(text, return_offsets_mapping=True)`), then average activations within each word's token span before patching. Could grow into a small `transformer_lens.utils.pool_token_span` helper if demand exists.
+
diff --git a/SAE_INTEGRATION_PLAN.md b/SAE_INTEGRATION_PLAN.md
new file mode 100644
index 000000000..9684ca8b0
--- /dev/null
+++ b/SAE_INTEGRATION_PLAN.md
@@ -0,0 +1,130 @@
+# SAE Hook Integration — TransformerBridge
+
+**Status:** Design proposal (not yet greenlit)
+**Context:** Surfaced as a candidate feature for the 2026 community survey. This plan
+captures the recommended shape so the implementation isn't over-promised.
+
+## Core thesis
+
+**SAEs are just specialized hooks.** Don't build a parallel API surface — extend the
+existing one. A single class composes with every mechanism the bridge already exposes
+(`run_with_cache`, `run_with_hooks`, `add_perma_hook`, `list_hooks()`, `generate()`).
+
+## Primitive
+
+```python
+class SAEHook:
+    """A hook that runs an SAE at a hookpoint with configurable modes."""
+
+    MODES = ("observe", "reconstruct", "intervene")
+
+    def __init__(self, sae, mode="observe", intervene_fn=None):
+        # sae: anything with .encode(act) → latents and .decode(latents) → act
+        # mode='observe':    cache latents, pass original activation through (zero perturbation)
+        # mode='reconstruct': replace activation with sae.decode(sae.encode(act))
+        # mode='intervene':  pass latents through intervene_fn before decode
+        ...
+
+    def __call__(self, activation, hook):
+        latents = self.sae.encode(activation)
+        hook.ctx["sae_latents"] = latents
+        if self.mode == "observe":
+            return activation
+        if self.mode == "intervene":
+            latents = self.intervene_fn(latents, hook)
+        return self.sae.decode(latents)
+```
+
+Stashing latents in `hook.ctx` means the existing cache machinery picks them up for free.
+
+## Bridge convenience layer
+
+```python
+# Sugar over add_hook + SAEHook construction
+bridge.attach_sae("blocks.6.hook_resid_pre", sae, mode="reconstruct")
+
+# Scoped attach (cleans up on exit; mirrors run_with_hooks contract)
+with bridge.saes({
+    "blocks.6.hook_resid_pre":  (sae6,  "observe"),
+    "blocks.10.hook_resid_post": (sae10, "intervene", ablate_feature_42),
+}):
+    logits, cache = bridge.run_with_cache(tokens)
+# cache["blocks.6.hook_resid_pre"]              → activations (as today)
+# cache["blocks.6.hook_resid_pre.sae_latents"]  → SAE latents (new, via hook.ctx)
+```
+
+Two new methods (`attach_sae`, `saes()`), not four.
+
+## Why this shape fits the bridge
+
+1. **Reuses the hook lifecycle.** No new lifecycle to debug — existing `LensHandle` /
+   `add_hook` / `remove_hooks` flow handles attach/detach/scoping.
+2. **Generation-aware for free.** `bridge.generate()` already invokes hooks per-token;
+   SAE hooks fire correctly without extra plumbing.
+3. **Composes with `list_hooks()` / `HookPoint.__repr__`** (landed for #297) — researchers
+   can introspect which SAEs are currently attached without a separate API.
+
+## The hard part: KV cache + SAE
+
+If an SAE replaces residual-stream activations mid-generation, the cached K/V from
+earlier positions is now inconsistent. Two options:
+
+- **Strict** *(v1 default)*: invalidate KV cache when any SAE in `reconstruct` or
+  `intervene` mode is attached to a pre-attention hookpoint. Slower (recompute full
+  prefix per generation step) but correct.
+- **Lenient** *(v2 opt-in)*: leave KV cache alone, document that SAE-modified prefixes
+  are not retroactively re-propagated. Faster, but generation post-attach reflects
+  partial state.
+
+Surface as `bridge.attach_sae(..., kv_cache="strict"|"lenient")` once researchers
+complain about strict-mode speed.
+
+## Compatibility constraints
+
+- **SAE Lens interop.** The `sae` parameter is duck-typed on `.encode(act)` and
+  `.decode(latents)` — SAE Lens's `SAE` class satisfies this. Don't require a new SAE
+  base class.
+- **Hookpoint name flexibility.** Accept both bridge-native names and legacy HT names
+  (`blocks.6.hook_resid_pre`). Bridge's compatibility-mode aliases already handle this.
+- **Detach contract.** SAE parameters must be detached from the computation graph
+  during inference by default; opt-in `requires_grad=True` for joint training scenarios.
+
+## Implementation footprint
+
+| Piece | Approx. LoC |
+|---|---|
+| `SAEHook` class | ~80 |
+| `bridge.attach_sae` / `saes()` context manager | ~60 |
+| KV-cache invalidation handling | ~50 |
+| Tests (toy SAE on Pythia-70m + GPT-2) | ~250 |
+| Demo notebook | (separate) |
+| **Total** | **~440 + tests** |
+
+## v1 acceptance checklist
+
+- [ ] `SAEHook` class with three modes
+- [ ] `bridge.attach_sae(hookpoint, sae, mode=...)` → returns `LensHandle`
+- [ ] `bridge.saes({...})` context manager
+- [ ] SAE latents flow into `run_with_cache` output (no separate `run_with_sae_cache`)
+- [ ] Strict KV-cache invalidation under `reconstruct` / `intervene`
+- [ ] Compatible with SAE Lens's `SAE.encode` / `SAE.decode` interface (no adapter required)
+- [ ] Tests: toy SAE on Pythia-70m exercising all three modes + generation + cache
+- [ ] Demo notebook showing observe / reconstruct / feature-ablation flows
+
+## Out of scope for v1
+
+- SAE *training* through the bridge — supported via existing backward hooks, but no
+  dedicated training API.
+- Activation steering as a separate primitive — overlaps with `intervene` mode; ship
+  as its own feature only if usage patterns diverge.
+- Disk-cached activation datasets for SAE training — better as a standalone tool that
+  consumes `run_with_cache` output.
+- Cross-layer SAE composition (transcoders) — same primitive works, but the
+  bookkeeping (which SAE feeds which) deserves a separate design pass.
+
+## Survey framing
+
+If "native SAE hook integration" appears as a survey option, this is the actual
+scope behind it: ~440 LoC of bridge plumbing that turns SAEs into composable hook
+primitives. Not a giant new subsystem. The honest uncertainty to disclose is the
+KV-cache mode choice; everything else is well-bounded.
diff --git a/docs/source/content/special_cases.md b/docs/source/content/special_cases.md
index eda0db828..d4b33c2cd 100644
--- a/docs/source/content/special_cases.md
+++ b/docs/source/content/special_cases.md
@@ -14,3 +14,28 @@ There are two main ways to mitigate this:
 
 1. **Skip weight preprocessing.** On the bridge, simply load with `TransformerBridge.boot_transformers(...)` and do not call `enable_compatibility_mode()` - the bridge preserves raw HF weights by default, so no additional flag is needed. On the legacy `HookedTransformer` path, use `HookedTransformer.from_pretrained_no_processing` instead of `HookedTransformer.from_pretrained`.
 2. **Increase the precision of the data type used in the model.**
+
+## Qwen3.5 text-only models
+
+Qwen3.5 support is available only through `TransformerBridge`, not the legacy
+`HookedTransformer.from_pretrained` path. Install a Transformers release that
+includes `Qwen3_5ForCausalLM` before loading these models:
+
+```bash
+pip install "transformers>=5.2.0"
+```
+
+Dense text-only checkpoints can then be loaded with:
+
+```python
+from transformer_lens.model_bridge import TransformerBridge
+
+bridge = TransformerBridge.boot_transformers("Qwen/Qwen3.5-0.8B")
+```
+
+Qwen3.5 uses a hybrid stack. Full-attention layers expose the usual hooks under
+`blocks.N.attn.*`; linear-attention layers expose GatedDeltaNet hooks under
+`blocks.N.linear_attn.*`, including `hook_q_pre_conv`, `hook_k_pre_conv`,
+`hook_v_pre_conv`, `hook_beta`, `hook_log_decay`, `hook_recurrence_out`, and
+`hook_out`. Full multimodal `Qwen3_5ForConditionalGeneration`, image/video
+inputs, and Qwen3.5 MoE checkpoints are not supported by this adapter.
diff --git a/pyproject.toml b/pyproject.toml
index 4159b456f..0fd085292 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -36,6 +36,7 @@
         # whenever chardet>=6 is installed. Remove the pin when psf/requests bumps the cap.
         evals=["lm-eval>=0.4", "chardet<6"]
         lit=["lit-nlp>=1.3"]
+        qwen35=["packaging>=23.0", "transformers>=5.2.0"]
 
     [project.scripts]
         build-docs="docs.make_docs:build_docs"
diff --git a/tests/unit/model_bridge/supported_architectures/test_qwen3_5_adapter.py b/tests/unit/model_bridge/supported_architectures/test_qwen3_5_adapter.py
index f34fc0011..393e0620d 100644
--- a/tests/unit/model_bridge/supported_architectures/test_qwen3_5_adapter.py
+++ b/tests/unit/model_bridge/supported_architectures/test_qwen3_5_adapter.py
@@ -3,6 +3,8 @@
 Qwen3_5 is supported only via TransformerBridge, not HookedTransformer.
 """
 
+from types import SimpleNamespace
+
 import pytest
 
 from transformer_lens.factories.architecture_adapter_factory import (
@@ -19,10 +21,15 @@
     _QWEN3_5_AVAILABLE = False
 
 
-@pytest.mark.skipif(
-    not _QWEN3_5_AVAILABLE,
-    reason="Qwen3_5TextConfig / Qwen3_5ForCausalLM not available in installed transformers",
-)
+@pytest.fixture
+def qwen3_5_dependency_available(monkeypatch):
+    """Make adapter-only tests independent of the installed Transformers build."""
+    import transformers
+
+    monkeypatch.setattr(transformers, "__version__", "5.10.0")
+    monkeypatch.setattr(transformers, "Qwen3_5ForCausalLM", object(), raising=False)
+
+
 class TestQwen3_5Registration:
     """Adapter is registered in all lookup tables."""
 
@@ -47,6 +54,212 @@ def test_adapter_class_correct(self):
         assert SUPPORTED_ARCHITECTURES["Qwen3_5ForCausalLM"] is Qwen3_5ArchitectureAdapter
 
 
+class TestQwen3_5ArchitectureDetection:
+    """Tests that do not require a Transformers build with Qwen3.5 classes."""
+
+    def test_model_type_qwen3_5_routes_to_text_only_architecture(self):
+        from transformer_lens.model_bridge.sources.transformers import (
+            determine_architecture_from_hf_config,
+        )
+
+        cfg = SimpleNamespace(model_type="qwen3_5", architectures=[])
+        assert determine_architecture_from_hf_config(cfg) == "Qwen3_5ForCausalLM"
+
+    def test_model_type_qwen3_5_text_routes_to_text_only_architecture(self):
+        from transformer_lens.model_bridge.sources.transformers import (
+            determine_architecture_from_hf_config,
+        )
+
+        cfg = SimpleNamespace(model_type="qwen3_5_text", architectures=[])
+        assert determine_architecture_from_hf_config(cfg) == "Qwen3_5ForCausalLM"
+
+    def test_full_conditional_generation_architecture_is_not_registered(self):
+        assert "Qwen3_5ForConditionalGeneration" not in SUPPORTED_ARCHITECTURES
+        assert "Qwen3_5ForConditionalGeneration" not in HF_SUPPORTED_ARCHITECTURES
+
+
+class TestQwen3_5DependencyGate:
+    """Verify optional dependency errors are clear and use real version ordering."""
+
+    def test_old_transformers_version_raises_clear_import_error(self, monkeypatch):
+        import transformers
+
+        from transformer_lens.model_bridge.supported_architectures.qwen3_5 import (
+            Qwen3_5ArchitectureAdapter,
+        )
+
+        monkeypatch.setattr(transformers, "__version__", "4.57.3")
+        monkeypatch.setattr(transformers, "Qwen3_5ForCausalLM", object(), raising=False)
+
+        with pytest.raises(ImportError, match=r"requires transformers >= 5\.2\.0"):
+            Qwen3_5ArchitectureAdapter(_make_bridge_cfg())
+
+    def test_missing_qwen3_5_class_raises_clear_import_error(self, monkeypatch):
+        import transformers
+
+        from transformer_lens.model_bridge.supported_architectures.qwen3_5 import (
+            Qwen3_5ArchitectureAdapter,
+        )
+
+        monkeypatch.setattr(transformers, "__version__", "5.2.0")
+        monkeypatch.setattr(
+            Qwen3_5ArchitectureAdapter,
+            "_has_qwen3_5_causal_lm",
+            staticmethod(lambda _transformers_module: False),
+        )
+
+        with pytest.raises(ImportError, match="Qwen3_5ForCausalLM"):
+            Qwen3_5ArchitectureAdapter(_make_bridge_cfg())
+
+    def test_version_comparison_accepts_future_minor_versions(self, qwen3_5_dependency_available):
+        from transformer_lens.model_bridge.supported_architectures.qwen3_5 import (
+            Qwen3_5ArchitectureAdapter,
+        )
+
+        adapter = Qwen3_5ArchitectureAdapter(_make_bridge_cfg())
+        assert isinstance(adapter, Qwen3_5ArchitectureAdapter)
+
+
+class TestQwen3_5LoadingGuards:
+    """Text-only routing and preloaded-model guards."""
+
+    def test_prepare_loading_swaps_top_level_config_for_text_config(
+        self, qwen3_5_dependency_available
+    ):
+        from transformer_lens.model_bridge.supported_architectures.qwen3_5 import (
+            Qwen3_5ArchitectureAdapter,
+        )
+
+        adapter = Qwen3_5ArchitectureAdapter(_make_bridge_cfg())
+        text_config = SimpleNamespace(model_type="qwen3_5_text")
+        full_config = SimpleNamespace(model_type="qwen3_5", text_config=text_config)
+        model_kwargs = {"config": full_config}
+
+        adapter.prepare_loading("Qwen/Qwen3.5-0.8B", model_kwargs)
+
+        assert model_kwargs["config"] is text_config
+
+    def test_prepare_model_accepts_text_only_model(self, qwen3_5_dependency_available):
+        from transformer_lens.model_bridge.supported_architectures.qwen3_5 import (
+            Qwen3_5ArchitectureAdapter,
+        )
+
+        adapter = Qwen3_5ArchitectureAdapter(_make_bridge_cfg())
+        hf_model = SimpleNamespace(config=SimpleNamespace(architectures=["Qwen3_5ForCausalLM"]))
+
+        adapter.prepare_model(hf_model)
+
+    def test_prepare_model_rejects_conditional_generation_model(self, qwen3_5_dependency_available):
+        from transformer_lens.model_bridge.supported_architectures.qwen3_5 import (
+            Qwen3_5ArchitectureAdapter,
+        )
+
+        adapter = Qwen3_5ArchitectureAdapter(_make_bridge_cfg())
+        hf_model = SimpleNamespace(
+            config=SimpleNamespace(architectures=["Qwen3_5ForConditionalGeneration"])
+        )
+
+        with pytest.raises(ValueError, match="text-only"):
+            adapter.prepare_model(hf_model)
+
+    def test_prepare_model_rejects_unswapped_top_level_config(self, qwen3_5_dependency_available):
+        from transformer_lens.model_bridge.supported_architectures.qwen3_5 import (
+            Qwen3_5ArchitectureAdapter,
+        )
+
+        adapter = Qwen3_5ArchitectureAdapter(_make_bridge_cfg())
+        hf_model = SimpleNamespace(
+            config=SimpleNamespace(
+                architectures=["Qwen3_5ForCausalLM"],
+                text_config=SimpleNamespace(model_type="qwen3_5_text"),
+            )
+        )
+
+        with pytest.raises(ValueError, match="text-only"):
+            adapter.prepare_model(hf_model)
+
+    def test_load_weights_false_uses_prepared_text_config(
+        self, monkeypatch, qwen3_5_dependency_available
+    ):
+        from transformer_lens.model_bridge.bridge import TransformerBridge
+        from transformer_lens.model_bridge.sources import transformers as source
+
+        text_config = SimpleNamespace(
+            model_type="qwen3_5_text",
+            architectures=["Qwen3_5ForCausalLM"],
+            hidden_size=128,
+            head_dim=32,
+            num_attention_heads=4,
+            num_key_value_heads=2,
+            num_hidden_layers=2,
+            max_position_embeddings=64,
+            intermediate_size=256,
+            vocab_size=512,
+            hidden_act="silu",
+            rms_norm_eps=1e-6,
+            pad_token_id=0,
+            eos_token_id=1,
+        )
+        full_config = SimpleNamespace(
+            model_type="qwen3_5",
+            architectures=["Qwen3_5ForConditionalGeneration"],
+            text_config=text_config,
+            pad_token_id=0,
+            eos_token_id=1,
+        )
+
+        class DummyModel:
+            def __init__(self, config):
+                self.config = config
+
+            def parameters(self):
+                return iter(())
+
+        class DummyModelClass:
+            seen_config = None
+
+            @classmethod
+            def from_config(cls, config, **kwargs):
+                cls.seen_config = config
+                return DummyModel(config)
+
+        class DummyBridge(TransformerBridge):
+            def __init__(self, hf_model, adapter, tokenizer):
+                self.hf_model = hf_model
+                self.adapter = adapter
+                self.tokenizer = tokenizer
+
+        class DummyTokenizer:
+            bos_token_id = 2
+            eos_token_id = 1
+
+            def encode(self, text):
+                return [10]
+
+        monkeypatch.setattr(
+            source.AutoConfig,
+            "from_pretrained",
+            staticmethod(lambda *args, **kwargs: full_config),
+        )
+        monkeypatch.setattr(
+            source.AutoTokenizer,
+            "from_pretrained",
+            staticmethod(lambda *args, **kwargs: DummyTokenizer()),
+        )
+        monkeypatch.setattr(source, "TransformerBridge", DummyBridge)
+        monkeypatch.setattr(source, "setup_tokenizer", lambda tokenizer, **kwargs: tokenizer)
+
+        bridge = source.boot(
+            "Qwen/Qwen3.5-0.8B",
+            device="cpu",
+            load_weights=False,
+            model_class=DummyModelClass,
+        )
+
+        assert DummyModelClass.seen_config is text_config
+        assert bridge.hf_model.config is text_config
+
+
 def _make_bridge_cfg(**overrides):
     """Minimal TransformerBridgeConfig for Qwen3_5 adapter tests."""
     from transformer_lens.config.transformer_bridge_config import (
@@ -67,15 +280,11 @@ def _make_bridge_cfg(**overrides):
     return TransformerBridgeConfig(**defaults)
 
 
-@pytest.mark.skipif(
-    not _QWEN3_5_AVAILABLE,
-    reason="Qwen3_5TextConfig / Qwen3_5ForCausalLM not available in installed transformers",
-)
 class TestQwen3_5ComponentMapping:
     """self_attn is not a block submodule (absent on linear-attn layers); dense GatedMLP only."""
 
     @pytest.fixture
-    def adapter(self):
+    def adapter(self, qwen3_5_dependency_available):
         from transformer_lens.model_bridge.supported_architectures.qwen3_5 import (
             Qwen3_5ArchitectureAdapter,
         )
@@ -213,15 +422,11 @@ def test_weight_processing_conversions_empty(self, adapter):
         assert adapter.weight_processing_conversions == {}
 
 
-@pytest.mark.skipif(
-    not _QWEN3_5_AVAILABLE,
-    reason="Qwen3_5TextConfig / Qwen3_5ForCausalLM not available in installed transformers",
-)
 class TestQwen3_5ConfigAttributes:
     """cfg attributes set by the adapter."""
 
     @pytest.fixture
-    def adapter(self):
+    def adapter(self, qwen3_5_dependency_available):
         from transformer_lens.model_bridge.supported_architectures.qwen3_5 import (
             Qwen3_5ArchitectureAdapter,
         )
@@ -258,7 +463,7 @@ def test_attn_implementation_eager(self, adapter):
         """Eager attention required for output_attentions support."""
         assert adapter.cfg.attn_implementation == "eager"
 
-    def test_n_key_value_heads_set_when_gqa(self):
+    def test_n_key_value_heads_set_when_gqa(self, qwen3_5_dependency_available):
         from transformer_lens.model_bridge.supported_architectures.qwen3_5 import (
             Qwen3_5ArchitectureAdapter,
         )
@@ -267,7 +472,7 @@ def test_n_key_value_heads_set_when_gqa(self):
         adapter = Qwen3_5ArchitectureAdapter(cfg)
         assert adapter.cfg.n_key_value_heads == 2
 
-    def test_n_key_value_heads_not_set_when_absent(self):
+    def test_n_key_value_heads_not_set_when_absent(self, qwen3_5_dependency_available):
         from transformer_lens.config.transformer_bridge_config import (
             TransformerBridgeConfig,
         )
@@ -293,10 +498,6 @@ def test_n_key_value_heads_not_set_when_absent(self):
         )
 
 
-@pytest.mark.skipif(
-    not _QWEN3_5_AVAILABLE,
-    reason="Qwen3_5TextConfig / Qwen3_5ForCausalLM not available in installed transformers",
-)
 class TestQwen3_5PreprocessWeights:
     """q_proj rows are interleaved per-head (query, gate, query, gate, ...) — naive first-half slice is wrong."""
 
@@ -305,7 +506,7 @@ class TestQwen3_5PreprocessWeights:
     HIDDEN_SIZE = 32
 
     @pytest.fixture
-    def adapter(self):
+    def adapter(self, qwen3_5_dependency_available):
         from transformer_lens.model_bridge.supported_architectures.qwen3_5 import (
             Qwen3_5ArchitectureAdapter,
         )
@@ -667,23 +868,38 @@ def test_forward_pass_consistency(self, bridge, hf_model):
         ), f"Logit mismatch: max diff = {(hf_logits - bridge_logits).abs().max().item():.6f}"
 
     def test_hook_activation_shapes(self, bridge):
+        """MLP, full-attention, and linear-attention hooks must all fire."""
         import torch
 
-        captured: list[torch.Tensor] = []
+        hook_names = [
+            "blocks.0.mlp.hook_out",
+            "blocks.3.attn.hook_out",
+            "blocks.0.linear_attn.hook_out",
+        ]
+        captured: dict[str, list[torch.Tensor]] = {name: [] for name in hook_names}
 
-        def capture_hook(tensor: torch.Tensor, hook: object) -> torch.Tensor:
-            captured.append(tensor.detach().clone())
-            return tensor
+        def capture_hook(name: str):
+            def _capture(tensor: torch.Tensor, hook: object) -> torch.Tensor:
+                captured[name].append(tensor.detach().clone())
+                return tensor
+
+            return _capture
 
         tokens = torch.randint(0, 512, (1, 4))
         with torch.no_grad():
-            bridge.run_with_hooks(tokens, fwd_hooks=[("blocks.0.mlp.hook_out", capture_hook)])
+            bridge.run_with_hooks(
+                tokens,
+                fwd_hooks=[(name, capture_hook(name)) for name in hook_names],
+            )
 
-        assert len(captured) == 1, "Hook must fire exactly once per forward pass"
-        output = captured[0]
         batch, seq, d_model = 1, 4, 128
-        assert output.shape == (
-            batch,
-            seq,
-            d_model,
-        ), f"Expected MLP output shape ({batch}, {seq}, {d_model}), got {output.shape}"
+        for hook_name, activations in captured.items():
+            assert len(activations) == 1, f"{hook_name} must fire exactly once"
+            assert activations[0].shape == (
+                batch,
+                seq,
+                d_model,
+            ), (
+                f"Expected {hook_name} shape ({batch}, {seq}, {d_model}), "
+                f"got {activations[0].shape}"
+            )
diff --git a/transformer_lens/benchmarks/component_outputs.py b/transformer_lens/benchmarks/component_outputs.py
index ef3751ead..1efe59f82 100644
--- a/transformer_lens/benchmarks/component_outputs.py
+++ b/transformer_lens/benchmarks/component_outputs.py
@@ -574,6 +574,10 @@ def _test_component(
                     device=test_input.device,
                     dtype=test_input.dtype,
                 )
+                if "attn" in component_path:
+                    self._add_direct_attention_mask_if_needed(
+                        shared_inputs, hf_component, batch_size, seq_len
+                    )
 
                 # Override position_embeddings with correct values from HF model's rotary_emb
                 # This is needed for models with partial RoPE or non-standard rotary dims
@@ -663,6 +667,37 @@ def _test_component(
                 error_message=str(e),
             )
 
+    @staticmethod
+    def _add_direct_attention_mask_if_needed(
+        shared_inputs: Dict[str, Any],
+        hf_component: Any,
+        batch_size: int,
+        seq_len: int,
+    ) -> None:
+        """Add a causal mask for direct HF attention calls that need parent context."""
+        if "attention_mask" in shared_inputs:
+            return
+        hidden_states = shared_inputs.get("hidden_states")
+        if not isinstance(hidden_states, torch.Tensor):
+            return
+        if not getattr(hf_component, "is_causal", False):
+            return
+        if getattr(hf_component, "is_cross_attention", False):
+            return
+
+        min_dtype = torch.finfo(hidden_states.dtype).min
+        causal_mask = torch.ones(seq_len, seq_len, device=hidden_states.device, dtype=torch.bool)
+        causal_mask = torch.tril(causal_mask).view(1, 1, seq_len, seq_len)
+        attention_mask = torch.zeros(
+            batch_size,
+            1,
+            seq_len,
+            seq_len,
+            device=hidden_states.device,
+            dtype=hidden_states.dtype,
+        )
+        shared_inputs["attention_mask"] = attention_mask.masked_fill(~causal_mask, min_dtype)
+
     def _run_component(
         self,
         component: nn.Module,
diff --git a/transformer_lens/model_bridge/generalized_components/block.py b/transformer_lens/model_bridge/generalized_components/block.py
index b99c6a667..02845998e 100644
--- a/transformer_lens/model_bridge/generalized_components/block.py
+++ b/transformer_lens/model_bridge/generalized_components/block.py
@@ -132,9 +132,17 @@ def forward(self, *args: Any, **kwargs: Any) -> Any:
         filtered_kwargs = self._filter_kwargs_for_forward(kwargs, len(args))
 
         output = self.original_component(*args, **filtered_kwargs)
-        return self._apply_output_hook(output)
+        force_tuple_for_bare_tensor = self._is_standalone_hidden_state_call(args, filtered_kwargs)
+        return self._apply_output_hook(
+            output, force_tuple_for_bare_tensor=force_tuple_for_bare_tensor
+        )
 
-    def _apply_output_hook(self, output: Any, wrap_single_element: bool = True) -> Any:
+    def _apply_output_hook(
+        self,
+        output: Any,
+        wrap_single_element: bool = True,
+        force_tuple_for_bare_tensor: bool = False,
+    ) -> Any:
         """Hook the primary tensor in the output and return the result.
 
         Args:
@@ -142,6 +150,10 @@ def _apply_output_hook(self, output: Any, wrap_single_element: bool = True) -> A
             wrap_single_element: If True, single-element tuples stay as tuples after
                 hooking (default, required by most HF models). If False, single-element
                 tuples are unwrapped to a bare tensor (Bloom convention).
+            force_tuple_for_bare_tensor: If True, bare tensor outputs are wrapped into
+                a one-element tuple after hooking. This keeps standalone BlockBridge
+                calls compatible with HF block APIs that expose tuple-like block outputs,
+                while preserving tensor outputs during newer HF parent-model execution.
         """
         if isinstance(output, tuple) and len(output) > 0:
             first = output[0]
@@ -153,8 +165,28 @@ def _apply_output_hook(self, output: Any, wrap_single_element: bool = True) -> A
             return output
         if isinstance(output, torch.Tensor):
             output = self.hook_out(output)
+            if force_tuple_for_bare_tensor and wrap_single_element:
+                return (output,)
+            return output
         return output
 
+    @staticmethod
+    def _is_standalone_hidden_state_call(args: tuple, kwargs: dict) -> bool:
+        """Return True for direct block(hidden_states) style calls.
+
+        Transformers versions differ on whether parent model loops expect block
+        outputs as tuples or tensors. We preserve the original tensor return during
+        full-model execution, but expose tuple-like output for standalone component
+        calls so `output[0]` does not accidentally drop the batch dimension.
+        """
+        if len(args) == 1 and isinstance(args[0], torch.Tensor) and not kwargs:
+            return True
+        return (
+            len(args) == 0
+            and set(kwargs.keys()) == {"hidden_states"}
+            and isinstance(kwargs["hidden_states"], torch.Tensor)
+        )
+
     def _check_stop_at_layer(self, *args: Any, **kwargs: Any) -> None:
         """Check if execution should stop before this block. Raises StopAtLayerException.
 
diff --git a/transformer_lens/model_bridge/sources/transformers.py b/transformer_lens/model_bridge/sources/transformers.py
index be2659e89..e30a022f4 100644
--- a/transformer_lens/model_bridge/sources/transformers.py
+++ b/transformer_lens/model_bridge/sources/transformers.py
@@ -592,8 +592,9 @@ def boot(
         from_config_kwargs = {}
         if trust_remote_code:
             from_config_kwargs["trust_remote_code"] = True
+        prepared_config = model_kwargs.get("config", hf_config)
         with contextlib.redirect_stdout(None):
-            hf_model = model_class.from_config(hf_config, **from_config_kwargs)
+            hf_model = model_class.from_config(prepared_config, **from_config_kwargs)
     else:
         try:
             hf_model = model_class.from_pretrained(model_name, **model_kwargs)
diff --git a/transformer_lens/model_bridge/supported_architectures/mamba.py b/transformer_lens/model_bridge/supported_architectures/mamba.py
index e4620749f..0023bbb5c 100644
--- a/transformer_lens/model_bridge/supported_architectures/mamba.py
+++ b/transformer_lens/model_bridge/supported_architectures/mamba.py
@@ -74,7 +74,12 @@ def create_stateful_cache(
         device: Any,
         dtype: torch.dtype,
     ) -> Any:
-        """Build a MambaCache for the stateful generation loop."""
-        from transformers.models.mamba.modeling_mamba import MambaCache
+        """Build a cache for the stateful generation loop."""
+        from transformers.cache_utils import DynamicCache
+        from transformers.models.mamba import modeling_mamba
 
-        return MambaCache(hf_model.config, batch_size, device=device, dtype=dtype)
+        cache_cls = getattr(modeling_mamba, "MambaCache", None)
+        if cache_cls is not None:
+            return cache_cls(hf_model.config, batch_size, device=device, dtype=dtype)
+
+        return DynamicCache(config=hf_model.config)
diff --git a/transformer_lens/model_bridge/supported_architectures/mamba2.py b/transformer_lens/model_bridge/supported_architectures/mamba2.py
index d6dc5f785..922c5512a 100644
--- a/transformer_lens/model_bridge/supported_architectures/mamba2.py
+++ b/transformer_lens/model_bridge/supported_architectures/mamba2.py
@@ -98,10 +98,15 @@ def create_stateful_cache(
         device: Any,
         dtype: torch.dtype,
     ) -> Any:
-        """Build a Mamba2Cache for the stateful generation loop."""
-        from transformers.models.mamba2.modeling_mamba2 import Mamba2Cache
+        """Build a cache for the stateful generation loop."""
+        from transformers.cache_utils import DynamicCache
+        from transformers.models.mamba2 import modeling_mamba2
 
-        return Mamba2Cache(hf_model.config, batch_size, device=device, dtype=dtype)
+        cache_cls = getattr(modeling_mamba2, "Mamba2Cache", None)
+        if cache_cls is not None:
+            return cache_cls(hf_model.config, batch_size, device=device, dtype=dtype)
+
+        return DynamicCache(config=hf_model.config)
 
 
 def compute_effective_attention(
diff --git a/transformer_lens/model_bridge/supported_architectures/qwen3_5.py b/transformer_lens/model_bridge/supported_architectures/qwen3_5.py
index 1ef0913bf..971cc4ed2 100644
--- a/transformer_lens/model_bridge/supported_architectures/qwen3_5.py
+++ b/transformer_lens/model_bridge/supported_architectures/qwen3_5.py
@@ -8,6 +8,7 @@
 from typing import Any
 
 import torch
+from packaging.version import InvalidVersion, Version
 
 from transformer_lens.model_bridge.supported_architectures.qwen3 import (
     Qwen3ArchitectureAdapter,
@@ -22,19 +23,40 @@ class Qwen3_5ArchitectureAdapter(Qwen3ArchitectureAdapter):
     - Gated q_proj (2x wide) sliced by preprocess_weights for weight analysis
     """
 
-    _MIN_TRANSFORMERS_VERSION = "5.2.0"
+    _MIN_TRANSFORMERS_VERSION = Version("5.2.0")
 
     def __init__(self, cfg: Any) -> None:
+        self._validate_transformers_support()
+        setattr(cfg, "gated_q_proj", True)
+        super().__init__(cfg, hybrid=True)
+
+    @classmethod
+    def _validate_transformers_support(cls) -> None:
+        """Fail clearly when the optional Qwen3.5 Transformers support is unavailable."""
         import transformers
 
-        if transformers.__version__ < self._MIN_TRANSFORMERS_VERSION:
+        try:
+            installed_version = Version(transformers.__version__)
+        except InvalidVersion:
+            installed_version = Version("0")
+
+        if installed_version < cls._MIN_TRANSFORMERS_VERSION:
             raise ImportError(
-                f"Qwen3.5 requires transformers >= {self._MIN_TRANSFORMERS_VERSION} "
+                f"Qwen3.5 requires transformers >= {cls._MIN_TRANSFORMERS_VERSION} "
                 f"(installed: {transformers.__version__}). "
-                f"Upgrade with: pip install 'transformers>={self._MIN_TRANSFORMERS_VERSION}'"
+                f"Upgrade with: pip install 'transformers>={cls._MIN_TRANSFORMERS_VERSION}'"
             )
-        setattr(cfg, "gated_q_proj", True)
-        super().__init__(cfg, hybrid=True)
+
+        if not cls._has_qwen3_5_causal_lm(transformers):
+            raise ImportError(
+                "Qwen3.5 requires a Transformers build that exposes "
+                "Qwen3_5ForCausalLM. Install the Qwen3.5 optional dependency "
+                f"with: pip install 'transformers>={cls._MIN_TRANSFORMERS_VERSION}'"
+            )
+
+    @staticmethod
+    def _has_qwen3_5_causal_lm(transformers_module: Any) -> bool:
+        return hasattr(transformers_module, "Qwen3_5ForCausalLM")
 
     def prepare_loading(self, model_name: str, model_kwargs: dict) -> None:
         """Swap multimodal Qwen3_5Config for text-only Qwen3_5TextConfig.
@@ -47,6 +69,26 @@ def prepare_loading(self, model_name: str, model_kwargs: dict) -> None:
         if config is not None and hasattr(config, "text_config"):
             model_kwargs["config"] = config.text_config
 
+    def prepare_model(self, hf_model: Any) -> None:
+        """Reject full multimodal Qwen3.5 models on this text-only adapter."""
+        config = getattr(hf_model, "config", None)
+        architectures = getattr(config, "architectures", []) or []
+        class_name = type(hf_model).__name__
+
+        is_conditional_generation = (
+            class_name == "Qwen3_5ForConditionalGeneration"
+            or "Qwen3_5ForConditionalGeneration" in architectures
+        )
+        still_has_top_level_multimodal_config = hasattr(config, "text_config")
+        if is_conditional_generation or still_has_top_level_multimodal_config:
+            raise ValueError(
+                "Qwen3.5 support in TransformerLens is text-only. Pass a "
+                "Qwen3_5ForCausalLM / Qwen3_5TextConfig model, or load by model id "
+                "with TransformerBridge.boot_transformers(...) so the text_config is "
+                "selected automatically. Qwen3_5ForConditionalGeneration, image/video "
+                "inputs, and Qwen3.5 MoE are not supported by this adapter."
+            )
+
     def preprocess_weights(self, state_dict: dict[str, torch.Tensor]) -> dict[str, torch.Tensor]:
         """Slice query half from gated q_proj.weight for weight-space analysis.
 
diff --git a/uv.lock b/uv.lock
index 3c58c2f7d..1fd85adf4 100644
--- a/uv.lock
+++ b/uv.lock
@@ -1777,31 +1777,34 @@ wheels = [
 
 [[package]]
 name = "hf-xet"
-version = "1.2.0"
+version = "1.5.0"
 source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/5e/6e/0f11bacf08a67f7fb5ee09740f2ca54163863b07b70d579356e9222ce5d8/hf_xet-1.2.0.tar.gz", hash = "sha256:a8c27070ca547293b6890c4bf389f713f80e8c478631432962bb7f4bc0bd7d7f", size = 506020, upload-time = "2025-10-24T19:04:32.129Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/9e/a5/85ef910a0aa034a2abcfadc360ab5ac6f6bc4e9112349bd40ca97551cff0/hf_xet-1.2.0-cp313-cp313t-macosx_10_12_x86_64.whl", hash = "sha256:ceeefcd1b7aed4956ae8499e2199607765fbd1c60510752003b6cc0b8413b649", size = 2861870, upload-time = "2025-10-24T19:04:11.422Z" },
-    { url = "https://files.pythonhosted.org/packages/ea/40/e2e0a7eb9a51fe8828ba2d47fe22a7e74914ea8a0db68a18c3aa7449c767/hf_xet-1.2.0-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:b70218dd548e9840224df5638fdc94bd033552963cfa97f9170829381179c813", size = 2717584, upload-time = "2025-10-24T19:04:09.586Z" },
-    { url = "https://files.pythonhosted.org/packages/a5/7d/daf7f8bc4594fdd59a8a596f9e3886133fdc68e675292218a5e4c1b7e834/hf_xet-1.2.0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:7d40b18769bb9a8bc82a9ede575ce1a44c75eb80e7375a01d76259089529b5dc", size = 3315004, upload-time = "2025-10-24T19:04:00.314Z" },
-    { url = "https://files.pythonhosted.org/packages/b1/ba/45ea2f605fbf6d81c8b21e4d970b168b18a53515923010c312c06cd83164/hf_xet-1.2.0-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:cd3a6027d59cfb60177c12d6424e31f4b5ff13d8e3a1247b3a584bf8977e6df5", size = 3222636, upload-time = "2025-10-24T19:03:58.111Z" },
-    { url = "https://files.pythonhosted.org/packages/4a/1d/04513e3cab8f29ab8c109d309ddd21a2705afab9d52f2ba1151e0c14f086/hf_xet-1.2.0-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:6de1fc44f58f6dd937956c8d304d8c2dea264c80680bcfa61ca4a15e7b76780f", size = 3408448, upload-time = "2025-10-24T19:04:20.951Z" },
-    { url = "https://files.pythonhosted.org/packages/f0/7c/60a2756d7feec7387db3a1176c632357632fbe7849fce576c5559d4520c7/hf_xet-1.2.0-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:f182f264ed2acd566c514e45da9f2119110e48a87a327ca271027904c70c5832", size = 3503401, upload-time = "2025-10-24T19:04:22.549Z" },
-    { url = "https://files.pythonhosted.org/packages/4e/64/48fffbd67fb418ab07451e4ce641a70de1c40c10a13e25325e24858ebe5a/hf_xet-1.2.0-cp313-cp313t-win_amd64.whl", hash = "sha256:293a7a3787e5c95d7be1857358a9130694a9c6021de3f27fa233f37267174382", size = 2900866, upload-time = "2025-10-24T19:04:33.461Z" },
-    { url = "https://files.pythonhosted.org/packages/e2/51/f7e2caae42f80af886db414d4e9885fac959330509089f97cccb339c6b87/hf_xet-1.2.0-cp314-cp314t-macosx_10_12_x86_64.whl", hash = "sha256:10bfab528b968c70e062607f663e21e34e2bba349e8038db546646875495179e", size = 2861861, upload-time = "2025-10-24T19:04:19.01Z" },
-    { url = "https://files.pythonhosted.org/packages/6e/1d/a641a88b69994f9371bd347f1dd35e5d1e2e2460a2e350c8d5165fc62005/hf_xet-1.2.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:2a212e842647b02eb6a911187dc878e79c4aa0aa397e88dd3b26761676e8c1f8", size = 2717699, upload-time = "2025-10-24T19:04:17.306Z" },
-    { url = "https://files.pythonhosted.org/packages/df/e0/e5e9bba7d15f0318955f7ec3f4af13f92e773fbb368c0b8008a5acbcb12f/hf_xet-1.2.0-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:30e06daccb3a7d4c065f34fc26c14c74f4653069bb2b194e7f18f17cbe9939c0", size = 3314885, upload-time = "2025-10-24T19:04:07.642Z" },
-    { url = "https://files.pythonhosted.org/packages/21/90/b7fe5ff6f2b7b8cbdf1bd56145f863c90a5807d9758a549bf3d916aa4dec/hf_xet-1.2.0-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:29c8fc913a529ec0a91867ce3d119ac1aac966e098cf49501800c870328cc090", size = 3221550, upload-time = "2025-10-24T19:04:05.55Z" },
-    { url = "https://files.pythonhosted.org/packages/6f/cb/73f276f0a7ce46cc6a6ec7d6c7d61cbfe5f2e107123d9bbd0193c355f106/hf_xet-1.2.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:66e159cbfcfbb29f920db2c09ed8b660eb894640d284f102ada929b6e3dc410a", size = 3408010, upload-time = "2025-10-24T19:04:28.598Z" },
-    { url = "https://files.pythonhosted.org/packages/b8/1e/d642a12caa78171f4be64f7cd9c40e3ca5279d055d0873188a58c0f5fbb9/hf_xet-1.2.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:9c91d5ae931510107f148874e9e2de8a16052b6f1b3ca3c1b12f15ccb491390f", size = 3503264, upload-time = "2025-10-24T19:04:30.397Z" },
-    { url = "https://files.pythonhosted.org/packages/17/b5/33764714923fa1ff922770f7ed18c2daae034d21ae6e10dbf4347c854154/hf_xet-1.2.0-cp314-cp314t-win_amd64.whl", hash = "sha256:210d577732b519ac6ede149d2f2f34049d44e8622bf14eb3d63bbcd2d4b332dc", size = 2901071, upload-time = "2025-10-24T19:04:37.463Z" },
-    { url = "https://files.pythonhosted.org/packages/96/2d/22338486473df5923a9ab7107d375dbef9173c338ebef5098ef593d2b560/hf_xet-1.2.0-cp37-abi3-macosx_10_12_x86_64.whl", hash = "sha256:46740d4ac024a7ca9b22bebf77460ff43332868b661186a8e46c227fdae01848", size = 2866099, upload-time = "2025-10-24T19:04:15.366Z" },
-    { url = "https://files.pythonhosted.org/packages/7f/8c/c5becfa53234299bc2210ba314eaaae36c2875e0045809b82e40a9544f0c/hf_xet-1.2.0-cp37-abi3-macosx_11_0_arm64.whl", hash = "sha256:27df617a076420d8845bea087f59303da8be17ed7ec0cd7ee3b9b9f579dff0e4", size = 2722178, upload-time = "2025-10-24T19:04:13.695Z" },
-    { url = "https://files.pythonhosted.org/packages/9a/92/cf3ab0b652b082e66876d08da57fcc6fa2f0e6c70dfbbafbd470bb73eb47/hf_xet-1.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3651fd5bfe0281951b988c0facbe726aa5e347b103a675f49a3fa8144c7968fd", size = 3320214, upload-time = "2025-10-24T19:04:03.596Z" },
-    { url = "https://files.pythonhosted.org/packages/46/92/3f7ec4a1b6a65bf45b059b6d4a5d38988f63e193056de2f420137e3c3244/hf_xet-1.2.0-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:d06fa97c8562fb3ee7a378dd9b51e343bc5bc8190254202c9771029152f5e08c", size = 3229054, upload-time = "2025-10-24T19:04:01.949Z" },
-    { url = "https://files.pythonhosted.org/packages/0b/dd/7ac658d54b9fb7999a0ccb07ad863b413cbaf5cf172f48ebcd9497ec7263/hf_xet-1.2.0-cp37-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:4c1428c9ae73ec0939410ec73023c4f842927f39db09b063b9482dac5a3bb737", size = 3413812, upload-time = "2025-10-24T19:04:24.585Z" },
-    { url = "https://files.pythonhosted.org/packages/92/68/89ac4e5b12a9ff6286a12174c8538a5930e2ed662091dd2572bbe0a18c8a/hf_xet-1.2.0-cp37-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:a55558084c16b09b5ed32ab9ed38421e2d87cf3f1f89815764d1177081b99865", size = 3508920, upload-time = "2025-10-24T19:04:26.927Z" },
-    { url = "https://files.pythonhosted.org/packages/cb/44/870d44b30e1dcfb6a65932e3e1506c103a8a5aea9103c337e7a53180322c/hf_xet-1.2.0-cp37-abi3-win_amd64.whl", hash = "sha256:e6584a52253f72c9f52f9e549d5895ca7a471608495c4ecaa6cc73dba2b24d69", size = 2905735, upload-time = "2025-10-24T19:04:35.928Z" },
+sdist = { url = "https://files.pythonhosted.org/packages/74/d8/5c06fc76461418326a7decf8367480c35be11a41fd938633929c60a9ec6b/hf_xet-1.5.0.tar.gz", hash = "sha256:e0fb0a34d9f406eed88233e829a67ec016bec5af19e480eac65a233ea289a948", size = 837196, upload-time = "2026-05-06T06:18:15.583Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/68/9b/6912c99070915a4f28119e3c5b52a9abd1eec0ad5cb293b8c967a0c6f5a2/hf_xet-1.5.0-cp313-cp313t-macosx_10_12_x86_64.whl", hash = "sha256:7d70fe2ce97b9db73b9c9b9c81fe3693640aec83416a966c446afea54acfae3c", size = 4023383, upload-time = "2026-05-06T06:17:53.947Z" },
+    { url = "https://files.pythonhosted.org/packages/0f/6d/9563cfde59b5d8128a9c7ec972a087f4c782e4f7bac5a85234edfd5d5e49/hf_xet-1.5.0-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:73a0dae8c71de3b0633a45c73f4a4a5ed09e94b43441d82981a781d4f12baa42", size = 3792751, upload-time = "2026-05-06T06:17:51.791Z" },
+    { url = "https://files.pythonhosted.org/packages/07/a5/ed5a0cf35b49a0571af5a8f53416dad1877a718c021c9937c3a53cb45781/hf_xet-1.5.0-cp313-cp313t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:a60290ec57e9b71767fba7c3645ddafdd0759974b540441510c629c6db6db24a", size = 4456058, upload-time = "2026-05-06T06:17:40.735Z" },
+    { url = "https://files.pythonhosted.org/packages/60/fb/3ae8bf2a7a37a4197d0195d7247fd25b3952e15cb8a599e285dfaa6f52b3/hf_xet-1.5.0-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:e5de0f6deada0dada870bb376a11bcd1f08abf3a968a6d118f33e72d1b1eb480", size = 4250783, upload-time = "2026-05-06T06:17:38.412Z" },
+    { url = "https://files.pythonhosted.org/packages/a2/9b/8bae40d4d91525085137196e84eb0ed49cf65b5e96e5c3ecdadd8bd0fac2/hf_xet-1.5.0-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:c799d49f1a5544a0ef7591c0ee75e0d6b93d6f56dc7a4979f59f7518d2872216", size = 4445594, upload-time = "2026-05-06T06:18:04.219Z" },
+    { url = "https://files.pythonhosted.org/packages/13/59/c74efbbd4e8728172b2cc72a2bc014d2947a4b7bdced932fbd3f5da1a4e5/hf_xet-1.5.0-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:2baea1b0b989e5c152fe81425f7745ddc8901280ba3d97c98d8cdece7b706c60", size = 4663995, upload-time = "2026-05-06T06:18:06.1Z" },
+    { url = "https://files.pythonhosted.org/packages/73/32/8e1e0410af64cda9b139d1dcebdc993a8ff9c8c7c0e2696ae356d75ccc0d/hf_xet-1.5.0-cp313-cp313t-win_amd64.whl", hash = "sha256:526345b3ed45f374f6317349df489167606736c876241ba984105afe7fd4839d", size = 3966608, upload-time = "2026-05-06T06:18:19.74Z" },
+    { url = "https://files.pythonhosted.org/packages/fc/34/a8febc8f4edbea8b3e21b02ebc8b628679b84ba7e45cde624a7736b51500/hf_xet-1.5.0-cp313-cp313t-win_arm64.whl", hash = "sha256:786d28e2eb8315d5035544b9d137b4a842d600c434bb91bf7d0d953cce906ad4", size = 3796946, upload-time = "2026-05-06T06:18:17.568Z" },
+    { url = "https://files.pythonhosted.org/packages/2a/20/8fc8996afe5815fa1a6be8e9e5c02f24500f409d599e905800d498a4e14d/hf_xet-1.5.0-cp314-cp314t-macosx_10_12_x86_64.whl", hash = "sha256:872d5601e6deea30d15865ede55d29eac6daf5a534ab417b99b6ef6b076dd96c", size = 4023495, upload-time = "2026-05-06T06:18:01.94Z" },
+    { url = "https://files.pythonhosted.org/packages/32/6a/93d84463c00cecb561a7508aa6303e35ee2894294eac14245526924415fe/hf_xet-1.5.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:9929561f5abf4581c8ea79587881dfef6b8abb2a0d8a51915936fc2a614f4e73", size = 3792731, upload-time = "2026-05-06T06:18:00.021Z" },
+    { url = "https://files.pythonhosted.org/packages/9d/5a/8ec8e0c863b382d00b3c2e2af6ded6b06371be617144a625903a6d562f4b/hf_xet-1.5.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:f7b7bbae318e583a86fb21e5a4a175d6721d628a2874f4bd022d0e660c32a682", size = 4456738, upload-time = "2026-05-06T06:17:49.574Z" },
+    { url = "https://files.pythonhosted.org/packages/c5/ca/f7effa1a67717da2bcc6b6c28f71c6ca648c77acaec4e2c32f40cbe16d85/hf_xet-1.5.0-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:cf7b2dc6f31a4ea754bb50f74cde482dcf5d366d184076d8530b9872787f3761", size = 4251622, upload-time = "2026-05-06T06:17:47.096Z" },
+    { url = "https://files.pythonhosted.org/packages/65/f2/19247dba3e231cf77dec59ddfb878f00057635ff773d099c9b59d37812c3/hf_xet-1.5.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:8dbcbab554c9ef158ef2c991545c3e970ddd8cc7acdcd0a78c5a41095dab4ded", size = 4445667, upload-time = "2026-05-06T06:18:11.983Z" },
+    { url = "https://files.pythonhosted.org/packages/7f/64/6f116801a3bcfb6f59f5c251f48cadc47ea54026441c4a385079286a94fa/hf_xet-1.5.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:5906bf7718d3636dc13402914736abe723492cb730f744834f5f5b67d3a12702", size = 4664619, upload-time = "2026-05-06T06:18:13.771Z" },
+    { url = "https://files.pythonhosted.org/packages/5c/e8/069542d37946ed08669b127e1496fa99e78196d71de8d41eda5e9f1b7a58/hf_xet-1.5.0-cp314-cp314t-win_amd64.whl", hash = "sha256:5f3dc2248fc01cc0a00cd392ab497f1ca373fcbc7e3f2da1f452480b384e839e", size = 3966802, upload-time = "2026-05-06T06:18:28.162Z" },
+    { url = "https://files.pythonhosted.org/packages/f9/91/fc6fdec27b14d04e88c386ac0a0129732b53fa23f7c4a78f4b83a039c567/hf_xet-1.5.0-cp314-cp314t-win_arm64.whl", hash = "sha256:b285cea1b5bab46b758772716ba8d6854a1a0310fed1c249d678a8b38601e5a0", size = 3797168, upload-time = "2026-05-06T06:18:26.287Z" },
+    { url = "https://files.pythonhosted.org/packages/3d/fb/69ff198a82cae7eb1a69fb84d93b3a3e4816564d76817fe541ddc96874eb/hf_xet-1.5.0-cp37-abi3-macosx_10_12_x86_64.whl", hash = "sha256:dad0dc84e941b8ba3c860659fe1fdc35c049d47cce293f003287757e971a8f56", size = 4030814, upload-time = "2026-05-06T06:17:57.933Z" },
+    { url = "https://files.pythonhosted.org/packages/9b/ff/edcc2b40162bef3ff78e14ab637e5f3b89243d6aee72f5949d3bb6a5af83/hf_xet-1.5.0-cp37-abi3-macosx_11_0_arm64.whl", hash = "sha256:fd6e5a9b0fdac4ed03ed45ef79254a655b1aaab514a02202617fbf643f5fdf7a", size = 3798444, upload-time = "2026-05-06T06:17:55.79Z" },
+    { url = "https://files.pythonhosted.org/packages/49/4d/103f76b04310e5e57656696cc184690d20c466af0bca3ca88f8c8ea5d4f3/hf_xet-1.5.0-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:3531b1823a0e6d77d80f9ed15ca0e00f0d115094f8ac033d5cae88f4564cc949", size = 4465986, upload-time = "2026-05-06T06:17:44.886Z" },
+    { url = "https://files.pythonhosted.org/packages/c4/a2/546f47f464737b3edbab6f8ddb57f2599b93d2cbb66f06abb475ccb48651/hf_xet-1.5.0-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:9a0ee58cd18d5ea799f7ed11290bbccbe56bdd8b1d97ca74b9cc49a3945d7a3b", size = 4259865, upload-time = "2026-05-06T06:17:42.639Z" },
+    { url = "https://files.pythonhosted.org/packages/95/7f/1be593c1f28613be2e196473481cd81bfc5910795e30a34e8f744f6cac4f/hf_xet-1.5.0-cp37-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:1e60df5a42e9bed8628b6416af2cba4cba57ae9f02de226a06b020d98e1aab18", size = 4459835, upload-time = "2026-05-06T06:18:08.026Z" },
+    { url = "https://files.pythonhosted.org/packages/aa/b2/703569fc881f3284487e68cda7b42179978480da3c438042a6bbbb4a671c/hf_xet-1.5.0-cp37-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:4b35549ce62601b84da4ff9b24d970032ace3d4430f52d91bcbb26c901d6c690", size = 4672414, upload-time = "2026-05-06T06:18:09.864Z" },
+    { url = "https://files.pythonhosted.org/packages/af/37/1b6def445c567286b50aa3b33828158e135b1be44938dde59f11382a500c/hf_xet-1.5.0-cp37-abi3-win_amd64.whl", hash = "sha256:2806c7c17b4d23f8d88f7c4814f838c3b6150773fe339c20af23e1cfaf2797e4", size = 3977238, upload-time = "2026-05-06T06:18:23.621Z" },
+    { url = "https://files.pythonhosted.org/packages/62/94/3b66b148778ee100dcfd69c2ca22b57b41b44d3063ceec934f209e9184ce/hf_xet-1.5.0-cp37-abi3-win_arm64.whl", hash = "sha256:b6c9df403040248c76d808d3e047d64db2d923bae593eb244c41e425cf6cd7be", size = 3806916, upload-time = "2026-05-06T06:18:21.7Z" },
 ]
 
 [[package]]
@@ -1834,7 +1837,7 @@ wheels = [
 
 [[package]]
 name = "huggingface-hub"
-version = "1.3.4"
+version = "1.15.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "filelock" },
@@ -1843,14 +1846,13 @@ dependencies = [
     { name = "httpx" },
     { name = "packaging" },
     { name = "pyyaml" },
-    { name = "shellingham" },
     { name = "tqdm" },
-    { name = "typer-slim" },
+    { name = "typer" },
     { name = "typing-extensions" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/af/25/74af9d16cd59ae15b12467a79a84aa0fe24be4aba68fc4da0c1864d49c17/huggingface_hub-1.3.4.tar.gz", hash = "sha256:c20d5484a611b7b7891d272e8fc9f77d5de025b0480bdacfa858efb3780b455f", size = 627683, upload-time = "2026-01-26T14:05:10.656Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/bb/b6/e22bd20a25299c34b8c5922c1545a6320825b13906eb0f7298edfd034a0b/huggingface_hub-1.15.0.tar.gz", hash = "sha256:28abfdddda3927fd4de6a63cf26ab012498a2c24dae52baf150c5c6edf98a1d5", size = 784100, upload-time = "2026-05-15T11:42:52.149Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/55/07/3d0c34c345043c6a398a5882e196b2220dc5861adfa18322448b90908f26/huggingface_hub-1.3.4-py3-none-any.whl", hash = "sha256:a0c526e76eb316e96a91e8a1a7a93cf66b0dd210be1a17bd5fc5ae53cba76bfd", size = 536611, upload-time = "2026-01-26T14:05:08.549Z" },
+    { url = "https://files.pythonhosted.org/packages/6e/11/0b64cc9024329b76d7547c19a67604a61d21d3ba678a69d1b220c29d5112/huggingface_hub-1.15.0-py3-none-any.whl", hash = "sha256:a4a59af04cbc41a3fe3fec429b171ef994ef8c971eda10136746f408dd4e3744", size = 663602, upload-time = "2026-05-15T11:42:50.487Z" },
 ]
 
 [[package]]
@@ -4987,71 +4989,123 @@ wheels = [
 
 [[package]]
 name = "regex"
-version = "2024.11.6"
-source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/8e/5f/bd69653fbfb76cf8604468d3b4ec4c403197144c7bfe0e6a5fc9e02a07cb/regex-2024.11.6.tar.gz", hash = "sha256:7ab159b063c52a0333c884e4679f8d7a85112ee3078fe3d9004b2dd875585519", size = 399494, upload-time = "2024-11-06T20:12:31.635Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/95/3c/4651f6b130c6842a8f3df82461a8950f923925db8b6961063e82744bddcc/regex-2024.11.6-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:ff590880083d60acc0433f9c3f713c51f7ac6ebb9adf889c79a261ecf541aa91", size = 482674, upload-time = "2024-11-06T20:08:57.575Z" },
-    { url = "https://files.pythonhosted.org/packages/15/51/9f35d12da8434b489c7b7bffc205c474a0a9432a889457026e9bc06a297a/regex-2024.11.6-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:658f90550f38270639e83ce492f27d2c8d2cd63805c65a13a14d36ca126753f0", size = 287684, upload-time = "2024-11-06T20:08:59.787Z" },
-    { url = "https://files.pythonhosted.org/packages/bd/18/b731f5510d1b8fb63c6b6d3484bfa9a59b84cc578ac8b5172970e05ae07c/regex-2024.11.6-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:164d8b7b3b4bcb2068b97428060b2a53be050085ef94eca7f240e7947f1b080e", size = 284589, upload-time = "2024-11-06T20:09:01.896Z" },
-    { url = "https://files.pythonhosted.org/packages/78/a2/6dd36e16341ab95e4c6073426561b9bfdeb1a9c9b63ab1b579c2e96cb105/regex-2024.11.6-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d3660c82f209655a06b587d55e723f0b813d3a7db2e32e5e7dc64ac2a9e86fde", size = 782511, upload-time = "2024-11-06T20:09:04.062Z" },
-    { url = "https://files.pythonhosted.org/packages/1b/2b/323e72d5d2fd8de0d9baa443e1ed70363ed7e7b2fb526f5950c5cb99c364/regex-2024.11.6-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:d22326fcdef5e08c154280b71163ced384b428343ae16a5ab2b3354aed12436e", size = 821149, upload-time = "2024-11-06T20:09:06.237Z" },
-    { url = "https://files.pythonhosted.org/packages/90/30/63373b9ea468fbef8a907fd273e5c329b8c9535fee36fc8dba5fecac475d/regex-2024.11.6-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:f1ac758ef6aebfc8943560194e9fd0fa18bcb34d89fd8bd2af18183afd8da3a2", size = 809707, upload-time = "2024-11-06T20:09:07.715Z" },
-    { url = "https://files.pythonhosted.org/packages/f2/98/26d3830875b53071f1f0ae6d547f1d98e964dd29ad35cbf94439120bb67a/regex-2024.11.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:997d6a487ff00807ba810e0f8332c18b4eb8d29463cfb7c820dc4b6e7562d0cf", size = 781702, upload-time = "2024-11-06T20:09:10.101Z" },
-    { url = "https://files.pythonhosted.org/packages/87/55/eb2a068334274db86208ab9d5599ffa63631b9f0f67ed70ea7c82a69bbc8/regex-2024.11.6-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:02a02d2bb04fec86ad61f3ea7f49c015a0681bf76abb9857f945d26159d2968c", size = 771976, upload-time = "2024-11-06T20:09:11.566Z" },
-    { url = "https://files.pythonhosted.org/packages/74/c0/be707bcfe98254d8f9d2cff55d216e946f4ea48ad2fd8cf1428f8c5332ba/regex-2024.11.6-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:f02f93b92358ee3f78660e43b4b0091229260c5d5c408d17d60bf26b6c900e86", size = 697397, upload-time = "2024-11-06T20:09:13.119Z" },
-    { url = "https://files.pythonhosted.org/packages/49/dc/bb45572ceb49e0f6509f7596e4ba7031f6819ecb26bc7610979af5a77f45/regex-2024.11.6-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:06eb1be98df10e81ebaded73fcd51989dcf534e3c753466e4b60c4697a003b67", size = 768726, upload-time = "2024-11-06T20:09:14.85Z" },
-    { url = "https://files.pythonhosted.org/packages/5a/db/f43fd75dc4c0c2d96d0881967897926942e935d700863666f3c844a72ce6/regex-2024.11.6-cp310-cp310-musllinux_1_2_i686.whl", hash = "sha256:040df6fe1a5504eb0f04f048e6d09cd7c7110fef851d7c567a6b6e09942feb7d", size = 775098, upload-time = "2024-11-06T20:09:16.504Z" },
-    { url = "https://files.pythonhosted.org/packages/99/d7/f94154db29ab5a89d69ff893159b19ada89e76b915c1293e98603d39838c/regex-2024.11.6-cp310-cp310-musllinux_1_2_ppc64le.whl", hash = "sha256:fdabbfc59f2c6edba2a6622c647b716e34e8e3867e0ab975412c5c2f79b82da2", size = 839325, upload-time = "2024-11-06T20:09:18.698Z" },
-    { url = "https://files.pythonhosted.org/packages/f7/17/3cbfab1f23356fbbf07708220ab438a7efa1e0f34195bf857433f79f1788/regex-2024.11.6-cp310-cp310-musllinux_1_2_s390x.whl", hash = "sha256:8447d2d39b5abe381419319f942de20b7ecd60ce86f16a23b0698f22e1b70008", size = 843277, upload-time = "2024-11-06T20:09:21.725Z" },
-    { url = "https://files.pythonhosted.org/packages/7e/f2/48b393b51900456155de3ad001900f94298965e1cad1c772b87f9cfea011/regex-2024.11.6-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:da8f5fc57d1933de22a9e23eec290a0d8a5927a5370d24bda9a6abe50683fe62", size = 773197, upload-time = "2024-11-06T20:09:24.092Z" },
-    { url = "https://files.pythonhosted.org/packages/45/3f/ef9589aba93e084cd3f8471fded352826dcae8489b650d0b9b27bc5bba8a/regex-2024.11.6-cp310-cp310-win32.whl", hash = "sha256:b489578720afb782f6ccf2840920f3a32e31ba28a4b162e13900c3e6bd3f930e", size = 261714, upload-time = "2024-11-06T20:09:26.36Z" },
-    { url = "https://files.pythonhosted.org/packages/42/7e/5f1b92c8468290c465fd50c5318da64319133231415a8aa6ea5ab995a815/regex-2024.11.6-cp310-cp310-win_amd64.whl", hash = "sha256:5071b2093e793357c9d8b2929dfc13ac5f0a6c650559503bb81189d0a3814519", size = 274042, upload-time = "2024-11-06T20:09:28.762Z" },
-    { url = "https://files.pythonhosted.org/packages/58/58/7e4d9493a66c88a7da6d205768119f51af0f684fe7be7bac8328e217a52c/regex-2024.11.6-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:5478c6962ad548b54a591778e93cd7c456a7a29f8eca9c49e4f9a806dcc5d638", size = 482669, upload-time = "2024-11-06T20:09:31.064Z" },
-    { url = "https://files.pythonhosted.org/packages/34/4c/8f8e631fcdc2ff978609eaeef1d6994bf2f028b59d9ac67640ed051f1218/regex-2024.11.6-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:2c89a8cc122b25ce6945f0423dc1352cb9593c68abd19223eebbd4e56612c5b7", size = 287684, upload-time = "2024-11-06T20:09:32.915Z" },
-    { url = "https://files.pythonhosted.org/packages/c5/1b/f0e4d13e6adf866ce9b069e191f303a30ab1277e037037a365c3aad5cc9c/regex-2024.11.6-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:94d87b689cdd831934fa3ce16cc15cd65748e6d689f5d2b8f4f4df2065c9fa20", size = 284589, upload-time = "2024-11-06T20:09:35.504Z" },
-    { url = "https://files.pythonhosted.org/packages/25/4d/ab21047f446693887f25510887e6820b93f791992994f6498b0318904d4a/regex-2024.11.6-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:1062b39a0a2b75a9c694f7a08e7183a80c63c0d62b301418ffd9c35f55aaa114", size = 792121, upload-time = "2024-11-06T20:09:37.701Z" },
-    { url = "https://files.pythonhosted.org/packages/45/ee/c867e15cd894985cb32b731d89576c41a4642a57850c162490ea34b78c3b/regex-2024.11.6-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:167ed4852351d8a750da48712c3930b031f6efdaa0f22fa1933716bfcd6bf4a3", size = 831275, upload-time = "2024-11-06T20:09:40.371Z" },
-    { url = "https://files.pythonhosted.org/packages/b3/12/b0f480726cf1c60f6536fa5e1c95275a77624f3ac8fdccf79e6727499e28/regex-2024.11.6-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:2d548dafee61f06ebdb584080621f3e0c23fff312f0de1afc776e2a2ba99a74f", size = 818257, upload-time = "2024-11-06T20:09:43.059Z" },
-    { url = "https://files.pythonhosted.org/packages/bf/ce/0d0e61429f603bac433910d99ef1a02ce45a8967ffbe3cbee48599e62d88/regex-2024.11.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f2a19f302cd1ce5dd01a9099aaa19cae6173306d1302a43b627f62e21cf18ac0", size = 792727, upload-time = "2024-11-06T20:09:48.19Z" },
-    { url = "https://files.pythonhosted.org/packages/e4/c1/243c83c53d4a419c1556f43777ccb552bccdf79d08fda3980e4e77dd9137/regex-2024.11.6-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:bec9931dfb61ddd8ef2ebc05646293812cb6b16b60cf7c9511a832b6f1854b55", size = 780667, upload-time = "2024-11-06T20:09:49.828Z" },
-    { url = "https://files.pythonhosted.org/packages/c5/f4/75eb0dd4ce4b37f04928987f1d22547ddaf6c4bae697623c1b05da67a8aa/regex-2024.11.6-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:9714398225f299aa85267fd222f7142fcb5c769e73d7733344efc46f2ef5cf89", size = 776963, upload-time = "2024-11-06T20:09:51.819Z" },
-    { url = "https://files.pythonhosted.org/packages/16/5d/95c568574e630e141a69ff8a254c2f188b4398e813c40d49228c9bbd9875/regex-2024.11.6-cp311-cp311-musllinux_1_2_i686.whl", hash = "sha256:202eb32e89f60fc147a41e55cb086db2a3f8cb82f9a9a88440dcfc5d37faae8d", size = 784700, upload-time = "2024-11-06T20:09:53.982Z" },
-    { url = "https://files.pythonhosted.org/packages/8e/b5/f8495c7917f15cc6fee1e7f395e324ec3e00ab3c665a7dc9d27562fd5290/regex-2024.11.6-cp311-cp311-musllinux_1_2_ppc64le.whl", hash = "sha256:4181b814e56078e9b00427ca358ec44333765f5ca1b45597ec7446d3a1ef6e34", size = 848592, upload-time = "2024-11-06T20:09:56.222Z" },
-    { url = "https://files.pythonhosted.org/packages/1c/80/6dd7118e8cb212c3c60b191b932dc57db93fb2e36fb9e0e92f72a5909af9/regex-2024.11.6-cp311-cp311-musllinux_1_2_s390x.whl", hash = "sha256:068376da5a7e4da51968ce4c122a7cd31afaaec4fccc7856c92f63876e57b51d", size = 852929, upload-time = "2024-11-06T20:09:58.642Z" },
-    { url = "https://files.pythonhosted.org/packages/11/9b/5a05d2040297d2d254baf95eeeb6df83554e5e1df03bc1a6687fc4ba1f66/regex-2024.11.6-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:ac10f2c4184420d881a3475fb2c6f4d95d53a8d50209a2500723d831036f7c45", size = 781213, upload-time = "2024-11-06T20:10:00.867Z" },
-    { url = "https://files.pythonhosted.org/packages/26/b7/b14e2440156ab39e0177506c08c18accaf2b8932e39fb092074de733d868/regex-2024.11.6-cp311-cp311-win32.whl", hash = "sha256:c36f9b6f5f8649bb251a5f3f66564438977b7ef8386a52460ae77e6070d309d9", size = 261734, upload-time = "2024-11-06T20:10:03.361Z" },
-    { url = "https://files.pythonhosted.org/packages/80/32/763a6cc01d21fb3819227a1cc3f60fd251c13c37c27a73b8ff4315433a8e/regex-2024.11.6-cp311-cp311-win_amd64.whl", hash = "sha256:02e28184be537f0e75c1f9b2f8847dc51e08e6e171c6bde130b2687e0c33cf60", size = 274052, upload-time = "2024-11-06T20:10:05.179Z" },
-    { url = "https://files.pythonhosted.org/packages/ba/30/9a87ce8336b172cc232a0db89a3af97929d06c11ceaa19d97d84fa90a8f8/regex-2024.11.6-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:52fb28f528778f184f870b7cf8f225f5eef0a8f6e3778529bdd40c7b3920796a", size = 483781, upload-time = "2024-11-06T20:10:07.07Z" },
-    { url = "https://files.pythonhosted.org/packages/01/e8/00008ad4ff4be8b1844786ba6636035f7ef926db5686e4c0f98093612add/regex-2024.11.6-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:fdd6028445d2460f33136c55eeb1f601ab06d74cb3347132e1c24250187500d9", size = 288455, upload-time = "2024-11-06T20:10:09.117Z" },
-    { url = "https://files.pythonhosted.org/packages/60/85/cebcc0aff603ea0a201667b203f13ba75d9fc8668fab917ac5b2de3967bc/regex-2024.11.6-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:805e6b60c54bf766b251e94526ebad60b7de0c70f70a4e6210ee2891acb70bf2", size = 284759, upload-time = "2024-11-06T20:10:11.155Z" },
-    { url = "https://files.pythonhosted.org/packages/94/2b/701a4b0585cb05472a4da28ee28fdfe155f3638f5e1ec92306d924e5faf0/regex-2024.11.6-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:b85c2530be953a890eaffde05485238f07029600e8f098cdf1848d414a8b45e4", size = 794976, upload-time = "2024-11-06T20:10:13.24Z" },
-    { url = "https://files.pythonhosted.org/packages/4b/bf/fa87e563bf5fee75db8915f7352e1887b1249126a1be4813837f5dbec965/regex-2024.11.6-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:bb26437975da7dc36b7efad18aa9dd4ea569d2357ae6b783bf1118dabd9ea577", size = 833077, upload-time = "2024-11-06T20:10:15.37Z" },
-    { url = "https://files.pythonhosted.org/packages/a1/56/7295e6bad94b047f4d0834e4779491b81216583c00c288252ef625c01d23/regex-2024.11.6-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:abfa5080c374a76a251ba60683242bc17eeb2c9818d0d30117b4486be10c59d3", size = 823160, upload-time = "2024-11-06T20:10:19.027Z" },
-    { url = "https://files.pythonhosted.org/packages/fb/13/e3b075031a738c9598c51cfbc4c7879e26729c53aa9cca59211c44235314/regex-2024.11.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:70b7fa6606c2881c1db9479b0eaa11ed5dfa11c8d60a474ff0e095099f39d98e", size = 796896, upload-time = "2024-11-06T20:10:21.85Z" },
-    { url = "https://files.pythonhosted.org/packages/24/56/0b3f1b66d592be6efec23a795b37732682520b47c53da5a32c33ed7d84e3/regex-2024.11.6-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:0c32f75920cf99fe6b6c539c399a4a128452eaf1af27f39bce8909c9a3fd8cbe", size = 783997, upload-time = "2024-11-06T20:10:24.329Z" },
-    { url = "https://files.pythonhosted.org/packages/f9/a1/eb378dada8b91c0e4c5f08ffb56f25fcae47bf52ad18f9b2f33b83e6d498/regex-2024.11.6-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:982e6d21414e78e1f51cf595d7f321dcd14de1f2881c5dc6a6e23bbbbd68435e", size = 781725, upload-time = "2024-11-06T20:10:28.067Z" },
-    { url = "https://files.pythonhosted.org/packages/83/f2/033e7dec0cfd6dda93390089864732a3409246ffe8b042e9554afa9bff4e/regex-2024.11.6-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:a7c2155f790e2fb448faed6dd241386719802296ec588a8b9051c1f5c481bc29", size = 789481, upload-time = "2024-11-06T20:10:31.612Z" },
-    { url = "https://files.pythonhosted.org/packages/83/23/15d4552ea28990a74e7696780c438aadd73a20318c47e527b47a4a5a596d/regex-2024.11.6-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:149f5008d286636e48cd0b1dd65018548944e495b0265b45e1bffecce1ef7f39", size = 852896, upload-time = "2024-11-06T20:10:34.054Z" },
-    { url = "https://files.pythonhosted.org/packages/e3/39/ed4416bc90deedbfdada2568b2cb0bc1fdb98efe11f5378d9892b2a88f8f/regex-2024.11.6-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:e5364a4502efca094731680e80009632ad6624084aff9a23ce8c8c6820de3e51", size = 860138, upload-time = "2024-11-06T20:10:36.142Z" },
-    { url = "https://files.pythonhosted.org/packages/93/2d/dd56bb76bd8e95bbce684326302f287455b56242a4f9c61f1bc76e28360e/regex-2024.11.6-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:0a86e7eeca091c09e021db8eb72d54751e527fa47b8d5787caf96d9831bd02ad", size = 787692, upload-time = "2024-11-06T20:10:38.394Z" },
-    { url = "https://files.pythonhosted.org/packages/0b/55/31877a249ab7a5156758246b9c59539abbeba22461b7d8adc9e8475ff73e/regex-2024.11.6-cp312-cp312-win32.whl", hash = "sha256:32f9a4c643baad4efa81d549c2aadefaeba12249b2adc5af541759237eee1c54", size = 262135, upload-time = "2024-11-06T20:10:40.367Z" },
-    { url = "https://files.pythonhosted.org/packages/38/ec/ad2d7de49a600cdb8dd78434a1aeffe28b9d6fc42eb36afab4a27ad23384/regex-2024.11.6-cp312-cp312-win_amd64.whl", hash = "sha256:a93c194e2df18f7d264092dc8539b8ffb86b45b899ab976aa15d48214138e81b", size = 273567, upload-time = "2024-11-06T20:10:43.467Z" },
-    { url = "https://files.pythonhosted.org/packages/90/73/bcb0e36614601016552fa9344544a3a2ae1809dc1401b100eab02e772e1f/regex-2024.11.6-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:a6ba92c0bcdf96cbf43a12c717eae4bc98325ca3730f6b130ffa2e3c3c723d84", size = 483525, upload-time = "2024-11-06T20:10:45.19Z" },
-    { url = "https://files.pythonhosted.org/packages/0f/3f/f1a082a46b31e25291d830b369b6b0c5576a6f7fb89d3053a354c24b8a83/regex-2024.11.6-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:525eab0b789891ac3be914d36893bdf972d483fe66551f79d3e27146191a37d4", size = 288324, upload-time = "2024-11-06T20:10:47.177Z" },
-    { url = "https://files.pythonhosted.org/packages/09/c9/4e68181a4a652fb3ef5099e077faf4fd2a694ea6e0f806a7737aff9e758a/regex-2024.11.6-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:086a27a0b4ca227941700e0b31425e7a28ef1ae8e5e05a33826e17e47fbfdba0", size = 284617, upload-time = "2024-11-06T20:10:49.312Z" },
-    { url = "https://files.pythonhosted.org/packages/fc/fd/37868b75eaf63843165f1d2122ca6cb94bfc0271e4428cf58c0616786dce/regex-2024.11.6-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:bde01f35767c4a7899b7eb6e823b125a64de314a8ee9791367c9a34d56af18d0", size = 795023, upload-time = "2024-11-06T20:10:51.102Z" },
-    { url = "https://files.pythonhosted.org/packages/c4/7c/d4cd9c528502a3dedb5c13c146e7a7a539a3853dc20209c8e75d9ba9d1b2/regex-2024.11.6-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:b583904576650166b3d920d2bcce13971f6f9e9a396c673187f49811b2769dc7", size = 833072, upload-time = "2024-11-06T20:10:52.926Z" },
-    { url = "https://files.pythonhosted.org/packages/4f/db/46f563a08f969159c5a0f0e722260568425363bea43bb7ae370becb66a67/regex-2024.11.6-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:1c4de13f06a0d54fa0d5ab1b7138bfa0d883220965a29616e3ea61b35d5f5fc7", size = 823130, upload-time = "2024-11-06T20:10:54.828Z" },
-    { url = "https://files.pythonhosted.org/packages/db/60/1eeca2074f5b87df394fccaa432ae3fc06c9c9bfa97c5051aed70e6e00c2/regex-2024.11.6-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3cde6e9f2580eb1665965ce9bf17ff4952f34f5b126beb509fee8f4e994f143c", size = 796857, upload-time = "2024-11-06T20:10:56.634Z" },
-    { url = "https://files.pythonhosted.org/packages/10/db/ac718a08fcee981554d2f7bb8402f1faa7e868c1345c16ab1ebec54b0d7b/regex-2024.11.6-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:0d7f453dca13f40a02b79636a339c5b62b670141e63efd511d3f8f73fba162b3", size = 784006, upload-time = "2024-11-06T20:10:59.369Z" },
-    { url = "https://files.pythonhosted.org/packages/c2/41/7da3fe70216cea93144bf12da2b87367590bcf07db97604edeea55dac9ad/regex-2024.11.6-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:59dfe1ed21aea057a65c6b586afd2a945de04fc7db3de0a6e3ed5397ad491b07", size = 781650, upload-time = "2024-11-06T20:11:02.042Z" },
-    { url = "https://files.pythonhosted.org/packages/a7/d5/880921ee4eec393a4752e6ab9f0fe28009435417c3102fc413f3fe81c4e5/regex-2024.11.6-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:b97c1e0bd37c5cd7902e65f410779d39eeda155800b65fc4d04cc432efa9bc6e", size = 789545, upload-time = "2024-11-06T20:11:03.933Z" },
-    { url = "https://files.pythonhosted.org/packages/dc/96/53770115e507081122beca8899ab7f5ae28ae790bfcc82b5e38976df6a77/regex-2024.11.6-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:f9d1e379028e0fc2ae3654bac3cbbef81bf3fd571272a42d56c24007979bafb6", size = 853045, upload-time = "2024-11-06T20:11:06.497Z" },
-    { url = "https://files.pythonhosted.org/packages/31/d3/1372add5251cc2d44b451bd94f43b2ec78e15a6e82bff6a290ef9fd8f00a/regex-2024.11.6-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:13291b39131e2d002a7940fb176e120bec5145f3aeb7621be6534e46251912c4", size = 860182, upload-time = "2024-11-06T20:11:09.06Z" },
-    { url = "https://files.pythonhosted.org/packages/ed/e3/c446a64984ea9f69982ba1a69d4658d5014bc7a0ea468a07e1a1265db6e2/regex-2024.11.6-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:4f51f88c126370dcec4908576c5a627220da6c09d0bff31cfa89f2523843316d", size = 787733, upload-time = "2024-11-06T20:11:11.256Z" },
-    { url = "https://files.pythonhosted.org/packages/2b/f1/e40c8373e3480e4f29f2692bd21b3e05f296d3afebc7e5dcf21b9756ca1c/regex-2024.11.6-cp313-cp313-win32.whl", hash = "sha256:63b13cfd72e9601125027202cad74995ab26921d8cd935c25f09c630436348ff", size = 262122, upload-time = "2024-11-06T20:11:13.161Z" },
-    { url = "https://files.pythonhosted.org/packages/45/94/bc295babb3062a731f52621cdc992d123111282e291abaf23faa413443ea/regex-2024.11.6-cp313-cp313-win_amd64.whl", hash = "sha256:2b3361af3198667e99927da8b84c1b010752fa4b1115ee30beaa332cabc3ef1a", size = 273545, upload-time = "2024-11-06T20:11:15Z" },
+version = "2026.5.9"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/dc/0e/49aee608ad09480e7fd276898c99ec6192985fa331abe4eb3a986094490b/regex-2026.5.9.tar.gz", hash = "sha256:a8234aa23ec39894bfe4a3f1b85616a7032481964a13ac6fc9f10de4f6fca270", size = 416074, upload-time = "2026-05-09T23:15:19.37Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/fe/ed/0ad2c8edf634918eb4484365d3819fa7bd7f58daf807fe7fb21812c316e5/regex-2026.5.9-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:a9e1328e17c84c1a5d22ec9f785ecef4a967fab9a42b6a8dc3bcbebd0a0c9e44", size = 489438, upload-time = "2026-05-09T23:11:29.374Z" },
+    { url = "https://files.pythonhosted.org/packages/89/a9/4ed972ad263963b860b7c3e86e0e1bcc791def47b43b8c8efe57e710f139/regex-2026.5.9-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:bfe1ce50cbfb569d74e1e4337da6468961f31dbea55fd85aa5de59c0947a805a", size = 291270, upload-time = "2026-05-09T23:11:33.254Z" },
+    { url = "https://files.pythonhosted.org/packages/16/81/075930d9fa28c4ea1f53398dd015ee7c882f623539759113cda1257f4b82/regex-2026.5.9-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:15ee42209947f4ca045412eae98416317238163618ace2a8e54f99586a466733", size = 289198, upload-time = "2026-05-09T23:11:35.769Z" },
+    { url = "https://files.pythonhosted.org/packages/d4/c8/5cdfbf0b5dc6599e1b6131eff43262e5275d4ec3469ce10216061659aadb/regex-2026.5.9-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b4bb445ff3f725f59df8f6014edb547ee928ec7023a774f6a39a3f953038cbb2", size = 784765, upload-time = "2026-05-09T23:11:37.689Z" },
+    { url = "https://files.pythonhosted.org/packages/cd/ca/ae5fd6edc59b7f84b904b31d6ec39a860cbcecd10f64bd5a062ca83a4864/regex-2026.5.9-cp310-cp310-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:446ddd671e43ab535810c4b21cff7104945c701d4a14d1e6d1cd6f4e445a8bea", size = 852115, upload-time = "2026-05-09T23:11:39.973Z" },
+    { url = "https://files.pythonhosted.org/packages/f6/ce/a91cf555afb51f3b74a182e24ba073b91ea7bb64592fc4b315c111bb19fd/regex-2026.5.9-cp310-cp310-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:7b92817338591505f282cf3864c145244b1edcf5381d237038df955001091538", size = 899503, upload-time = "2026-05-09T23:11:42.48Z" },
+    { url = "https://files.pythonhosted.org/packages/55/7f/725a0a2b245a4cf0c4bab29d0e97c74285d94136a65d1b55a6459a583502/regex-2026.5.9-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:d6b8a143aca6c39b446ea8092cde25cc8fe9304d4f5fecfbc1a9dbb0282703c2", size = 794093, upload-time = "2026-05-09T23:11:44.681Z" },
+    { url = "https://files.pythonhosted.org/packages/e3/2a/996efbd59ce6b5d4a09e3af6180ceb62af171f4a9a6fb557d2f0ae0d462b/regex-2026.5.9-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:0f03aa6898aaaac4592479821df16e68e8d0e29e903e65d8f2dfb2f19028a989", size = 786234, upload-time = "2026-05-09T23:11:46.882Z" },
+    { url = "https://files.pythonhosted.org/packages/4b/0a/8731e8b8806174c9cdd5903f80a14990331c1f42fc4209b540952e9e010d/regex-2026.5.9-cp310-cp310-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:ed457d8e98ae812ed7732bef7bf78de78e834eae0372a74e23ca90ef21d910f9", size = 769895, upload-time = "2026-05-09T23:11:49.324Z" },
+    { url = "https://files.pythonhosted.org/packages/9a/0b/932473194bd563f342a412ae2ffbbd6da608306a2bc4e99249a41c2b0b92/regex-2026.5.9-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:71b61c5bfe1c806332defc42ad6c780b3c55f661986d7f40283a3a88274b4c00", size = 774991, upload-time = "2026-05-09T23:11:51.261Z" },
+    { url = "https://files.pythonhosted.org/packages/98/80/9523d196010031df25f7177ee0a467efbee436324038e5d99def17a57515/regex-2026.5.9-cp310-cp310-musllinux_1_2_ppc64le.whl", hash = "sha256:3b1e39888c5e0c7d92cea4fc777396c4a90363b05de75d02eb459a4752200808", size = 848790, upload-time = "2026-05-09T23:11:53.232Z" },
+    { url = "https://files.pythonhosted.org/packages/3c/07/56987b35e89edf47e4a38cf2845aeee476bfa688a6bdbd3e820cda461dc1/regex-2026.5.9-cp310-cp310-musllinux_1_2_riscv64.whl", hash = "sha256:6ba42b2e7e7f46cf68cc6a5ca36fa07959f9bbd9c6bdcc47b6ee76549a590248", size = 757679, upload-time = "2026-05-09T23:11:55.82Z" },
+    { url = "https://files.pythonhosted.org/packages/04/2a/ff713fff0c566507c06a4ce2dc0ae8e7eeebc88811a95fc81cf1e7d534dd/regex-2026.5.9-cp310-cp310-musllinux_1_2_s390x.whl", hash = "sha256:c010eb8caca74bdb40c07498d7ece26b4428fd3f04aa8a72c9ac6f79e8faaac6", size = 837116, upload-time = "2026-05-09T23:11:57.934Z" },
+    { url = "https://files.pythonhosted.org/packages/77/90/df6d982b03e3614785c6937ba51b57f6733d97d2ee1c9bc7531dbfab3a54/regex-2026.5.9-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:a6a563446a41adc451393dc6b8e6ad87979efaee3c8738690a8d1b08ebead1b4", size = 782081, upload-time = "2026-05-09T23:11:59.607Z" },
+    { url = "https://files.pythonhosted.org/packages/c7/8a/4e88a5f7c3e98489aac4dd23142723d907b2a595b4a6abcbacabefeded09/regex-2026.5.9-cp310-cp310-win32.whl", hash = "sha256:954cc214c04663ee6d266fc61739cad83054683048de65c5bd1d640ad28098ac", size = 266247, upload-time = "2026-05-09T23:12:01.116Z" },
+    { url = "https://files.pythonhosted.org/packages/6a/40/4b224cb0582b2dca1786726e6cdabe26abbf757d7f6718332f186da155d2/regex-2026.5.9-cp310-cp310-win_amd64.whl", hash = "sha256:b310768746dd314ea6e2ff4cc89ef215426813396ff4e94ee8e6f7096c8b6e03", size = 278416, upload-time = "2026-05-09T23:12:03.2Z" },
+    { url = "https://files.pythonhosted.org/packages/12/4d/014fbe803204cab0947ee428f09f658a29632053dde1d3c6176bb4f0fd4c/regex-2026.5.9-cp310-cp310-win_arm64.whl", hash = "sha256:19c16ceb4a267a8789e25733e583983eeab9f0f8664e66b0bd1c5d21f14c2d4b", size = 270413, upload-time = "2026-05-09T23:12:04.649Z" },
+    { url = "https://files.pythonhosted.org/packages/c2/dc/c1f2df4027e82fc54b5a473e4b250f5139faca49a0fbe29a48668d228f34/regex-2026.5.9-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:ccf5249114cc3e772ecdd88a98a86eca0fd74c61ce32a94743758c083fc05d48", size = 489445, upload-time = "2026-05-09T23:12:06.111Z" },
+    { url = "https://files.pythonhosted.org/packages/03/d2/59f01110660081cce9c0bc30ebd0b5ee250dacf658e3248ed92f01e0e8ee/regex-2026.5.9-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:46f1326ca6e65b0879d23ca302c0f2415aad42ff0309b9c818e7949fe19a41d8", size = 291271, upload-time = "2026-05-09T23:12:07.731Z" },
+    { url = "https://files.pythonhosted.org/packages/58/b6/14b2c84ff90ddb370c81d27503f4a0fcf071496416f4855f6cc8c5d81c35/regex-2026.5.9-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:ef31cbfe458e21c6122ba8150ff060e0c7789ed0d26eb423f25472584920b555", size = 289212, upload-time = "2026-05-09T23:12:09.266Z" },
+    { url = "https://files.pythonhosted.org/packages/03/d0/4db86529117320de0c84afd90e70bb47434625875e34fcef9d8c127c5b16/regex-2026.5.9-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:992604d02e6d9c6d786c24a706a71ecffe1020fc1ef264044474cd81fa2c3919", size = 792310, upload-time = "2026-05-09T23:12:11.416Z" },
+    { url = "https://files.pythonhosted.org/packages/07/78/fe4800cd322f862ecffd2d553409b20d80650e5ed71b9d178f853d020b82/regex-2026.5.9-cp311-cp311-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:c9411dd64ca95477225734a93dfc8583b51916b8d5942f99d6cac21e09965451", size = 861721, upload-time = "2026-05-09T23:12:13.681Z" },
+    { url = "https://files.pythonhosted.org/packages/b5/d0/b3618a895dd8feb897c61bb2954edd265e1767d82a01d53065d5871127a3/regex-2026.5.9-cp311-cp311-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:3dd4a3ff360dfb836fecdb93a4598f9d6e2ac81e3e397125145c6221bf58cf4c", size = 906460, upload-time = "2026-05-09T23:12:15.443Z" },
+    { url = "https://files.pythonhosted.org/packages/33/6f/1481597e859ef19508b345eec4afd1416ed6e6b459c75a64026ef193aecf/regex-2026.5.9-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:2a661a7d270a61f7cf460caee8b9fa2d5ef9e5c681234bcb9e0fe14f488e7dfc", size = 799843, upload-time = "2026-05-09T23:12:16.892Z" },
+    { url = "https://files.pythonhosted.org/packages/73/59/955734c803f59108deccba3597ae440c76b62a652733c0006e6243758420/regex-2026.5.9-cp311-cp311-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:f079e50a0d3cc3cd5091fa9ff45869a2e6b2cd35895731edafb0327901a8d86d", size = 773610, upload-time = "2026-05-09T23:12:19.127Z" },
+    { url = "https://files.pythonhosted.org/packages/68/8f/70c04a236d651c81881dac42ef8538bddda6121434509d0a22d9e601503b/regex-2026.5.9-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:4ebe8f0b5ec5a5024dc4a4c59f444c4e9afc5f2abdbb8962065b75d27fb971f9", size = 781645, upload-time = "2026-05-09T23:12:20.806Z" },
+    { url = "https://files.pythonhosted.org/packages/1d/96/05c7434d88185e5d27fe54aeb74df86bd77cd79f52f0b4eae54faa8fea70/regex-2026.5.9-cp311-cp311-musllinux_1_2_ppc64le.whl", hash = "sha256:97cf3bc1b7d7d2306772ec07366c80d9df00ff79e79cea32898883a646d2fae2", size = 854473, upload-time = "2026-05-09T23:12:22.465Z" },
+    { url = "https://files.pythonhosted.org/packages/4e/c1/6e3d8202d981f3117004bf341ee74893ba4ba8a9fbaf4b94615846550a08/regex-2026.5.9-cp311-cp311-musllinux_1_2_riscv64.whl", hash = "sha256:0f9eede6a5cbdc02d4978090186390936e1776a7d1359b21e41014c609880bcf", size = 763311, upload-time = "2026-05-09T23:12:24.351Z" },
+    { url = "https://files.pythonhosted.org/packages/93/c7/e7737f1526b3fb32bd4c337fd6c71c3ebb5c8296fc34d11197e0955d2e35/regex-2026.5.9-cp311-cp311-musllinux_1_2_s390x.whl", hash = "sha256:01f0f5f55f4b64dacec85dc116d3c05fd23ad3ff037bbc73a2085775953c2611", size = 844593, upload-time = "2026-05-09T23:12:26.341Z" },
+    { url = "https://files.pythonhosted.org/packages/a5/27/0daffb1a535bb39f422c3d200f4ab023c71110ad66a32b366bee708baba0/regex-2026.5.9-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:1268eddd8486dc561d08eee1156e40aa3a8fe10f4bdec8fa653b455fcbffd12c", size = 789167, upload-time = "2026-05-09T23:12:27.975Z" },
+    { url = "https://files.pythonhosted.org/packages/ce/fc/294fe4fac4f2ed67207b17471815870c1c45b3a489e08e0ac96daea16ef6/regex-2026.5.9-cp311-cp311-win32.whl", hash = "sha256:8676474c07469d6f33dd1085ca2cd45f65785f32518f2b20e36d9953ca07f994", size = 266249, upload-time = "2026-05-09T23:12:30.141Z" },
+    { url = "https://files.pythonhosted.org/packages/d0/b0/8dce459f6245bcf8f6e9f23ac9569f1a0f15c131cc0745e82b43226204cf/regex-2026.5.9-cp311-cp311-win_amd64.whl", hash = "sha256:246de9d60aa3f8538b519834dd95cbf276ea263d6a7bd5a3666dc3fa0230505b", size = 278423, upload-time = "2026-05-09T23:12:31.676Z" },
+    { url = "https://files.pythonhosted.org/packages/db/8d/f9aeff6ad63a3ef720386f2907e6d34a35a510a6e498ebad28b0fb3f6ab6/regex-2026.5.9-cp311-cp311-win_arm64.whl", hash = "sha256:d726ca3f0d76969bf1e8e477d160d3d666bbf999f6860bd314889e5345782046", size = 270420, upload-time = "2026-05-09T23:12:33.194Z" },
+    { url = "https://files.pythonhosted.org/packages/50/9b/6550044bc44e17c84d312c031c2ec42fbdb6a4ec4e29093be3a172d08772/regex-2026.5.9-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:57eeeb05db7979413dec5438f2db21d7ecbba787cde7a711df1a6f6df672aa06", size = 490451, upload-time = "2026-05-09T23:12:34.72Z" },
+    { url = "https://files.pythonhosted.org/packages/1e/95/fc7ba4303b5a0f92446a12ee6778ef2c6c799233f5060042a31bf390cfe9/regex-2026.5.9-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:398c521292f4c7fb807001dcd54694d3a1fcafc179a36ad9cc56f98df85930b6", size = 292112, upload-time = "2026-05-09T23:12:36.285Z" },
+    { url = "https://files.pythonhosted.org/packages/54/4b/ee27938d1b2c443e89a9a10e00d2d19aa5ee300cd3d61140644e93bb083e/regex-2026.5.9-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:f7a7c26137296beba7784de6eba69c6a93a63ccebc385e4962fe67e267a91225", size = 289599, upload-time = "2026-05-09T23:12:38.089Z" },
+    { url = "https://files.pythonhosted.org/packages/d8/dd/ba103dc19614e25f3880800ca67ce093d6e21b325d72b8383c7bf906e9fa/regex-2026.5.9-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6441cc660d76107934a09c22167200839a0e89604a6297f78a974e66e931d2c0", size = 796732, upload-time = "2026-05-09T23:12:40.062Z" },
+    { url = "https://files.pythonhosted.org/packages/cf/e7/f035b4fd858b050b0080bf302968dc0f59ba34e391872d54936758e6844e/regex-2026.5.9-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:91328f1c23d47595ca3ef0a7557fa129c5a23404b775c770697d2f35b33e0107", size = 865440, upload-time = "2026-05-09T23:12:42.059Z" },
+    { url = "https://files.pythonhosted.org/packages/0a/51/8cd301ecc899aea28124357f729f4272f44de7806fc7ca02490bfbe253e8/regex-2026.5.9-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:93a7860539414dddaefba2b40f8771765ae17949d4c7182b876ce429e11a8309", size = 912329, upload-time = "2026-05-09T23:12:44.373Z" },
+    { url = "https://files.pythonhosted.org/packages/cc/1e/3fbe2fa1e8cebd62f3bb7d3321cff1640aca2e240b51d9bd624aad949260/regex-2026.5.9-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:dd2810d22146b6d838acc5ec15602cb6b47920aa4e33015df3868eedfd20bab8", size = 801239, upload-time = "2026-05-09T23:12:46.268Z" },
+    { url = "https://files.pythonhosted.org/packages/17/2f/6f6008682bf2cf98040a0d3153a8e557b6ab728d7713d045cee4ce544ab8/regex-2026.5.9-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:daff2bdbaf1d23e52fdff7c0b7bc2048b68f978df6a4d107ac981f94caef2e66", size = 777054, upload-time = "2026-05-09T23:12:48.051Z" },
+    { url = "https://files.pythonhosted.org/packages/19/2b/eee0d20a6842ba04df4b8847a920b57ef56853f14ef85405473e586b605a/regex-2026.5.9-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:4eeb011098fcb77af513dcef521a3dbecbf8849b1e38940759d293b7a93f5026", size = 785098, upload-time = "2026-05-09T23:12:49.851Z" },
+    { url = "https://files.pythonhosted.org/packages/4a/98/6fc1e6410feefb92159edaed5041992bfe390e8d26c721865434acbca558/regex-2026.5.9-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:ea9c8ecfa1b73c73b626534d6626e5340d429630943672b8480724f44e84b962", size = 860095, upload-time = "2026-05-09T23:12:51.666Z" },
+    { url = "https://files.pythonhosted.org/packages/18/a3/bd855e0f2cb1a978ecf6fa6bb69632dd9c3f6ea3b81cde62fde14c9daec7/regex-2026.5.9-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:cd2846168eb9ee3c513902bc8225409cb1caab31d04728b145171fa1625d9621", size = 765762, upload-time = "2026-05-09T23:12:53.413Z" },
+    { url = "https://files.pythonhosted.org/packages/dc/66/0ae8c092e60b14c79d24f8e0b7f0aea5bfbffdcab00b5483d13404d3c3a5/regex-2026.5.9-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:39617fb0cde9c0e6306dc70e3bfc096f3da793219879f7ae7aa341a69fbdcf6d", size = 852100, upload-time = "2026-05-09T23:12:55.256Z" },
+    { url = "https://files.pythonhosted.org/packages/21/de/8dfde60fc1b21c946a893ba273403b72617edb261370cb1087099a83f088/regex-2026.5.9-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:fd03c4f0e33280d15cae17159b899245d6b7c53d21def19b263b39655061f5ce", size = 789479, upload-time = "2026-05-09T23:12:57.573Z" },
+    { url = "https://files.pythonhosted.org/packages/c3/1c/bdcc98f9a4af4fdd166c74941174619ccff4726d3ce32faa8e9a2ecd38dd/regex-2026.5.9-cp312-cp312-win32.whl", hash = "sha256:164eba9b755ea6f244b0d881196fbc1fac09714e9782c9e2732b813142033c8e", size = 266699, upload-time = "2026-05-09T23:12:59.14Z" },
+    { url = "https://files.pythonhosted.org/packages/78/87/240d36864f9e48ace85f72e79ced97ceb7f27ce87739a947dcb834b4e6bc/regex-2026.5.9-cp312-cp312-win_amd64.whl", hash = "sha256:86f40a5d6444db30a125c9c9177e6b25dad981cbc37451fd838f145e6edac92e", size = 277783, upload-time = "2026-05-09T23:13:00.789Z" },
+    { url = "https://files.pythonhosted.org/packages/4f/b5/7b30f312b0669dff5beebe5b0989dc2d1a312b1a44fab852199c387a5b96/regex-2026.5.9-cp312-cp312-win_arm64.whl", hash = "sha256:96f5f58b54a063d7ea9dca08e1cf57bfe10499c4d579ee672da284f57f5f0070", size = 270513, upload-time = "2026-05-09T23:13:02.426Z" },
+    { url = "https://files.pythonhosted.org/packages/aa/da/797e91ecec6f84135da778ddce78c20e0af5d2a15c26f87a81bc3eadb6db/regex-2026.5.9-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:d626b84406444b165fc0ba981604edea39f0588ff1f92baa23fe50799ea9afdb", size = 490303, upload-time = "2026-05-09T23:13:04.382Z" },
+    { url = "https://files.pythonhosted.org/packages/44/da/bf30abaaa737b58f4a4b8c4a03659e02fd92092c822e0197ed9e0daab917/regex-2026.5.9-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:d7bdc0ab8f3dd7e1b4f9ab88634e13374669db86bb3c72e8292f07ae313f539f", size = 292019, upload-time = "2026-05-09T23:13:06.022Z" },
+    { url = "https://files.pythonhosted.org/packages/2d/e7/d0eaf5713828417b9e5648cf81fa9bacd4961f6ab98c380c2034f8716e35/regex-2026.5.9-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:a8820737949116ffff55fe18f9fc644530063ba6ebfcb8314239416e78f1347c", size = 289468, upload-time = "2026-05-09T23:13:08.214Z" },
+    { url = "https://files.pythonhosted.org/packages/d3/9b/b3fdd62b003baa1a9b593cd8c8699c9651c2e80cc21a5c715707983c42d7/regex-2026.5.9-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:aa0fbdbac82cb3e4450d0ccde7d7a35607f4cb2dd9fba4b8b69bfaf8c9fa6aed", size = 796749, upload-time = "2026-05-09T23:13:10.573Z" },
+    { url = "https://files.pythonhosted.org/packages/d4/30/66ab84588765f5b4b271a9ca09ef7ce2b87caa95176ec3d2ad65d7bc4902/regex-2026.5.9-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:57e8915c7986aa33d25e4d3629cef711cd2863f2961b10409f0c04cb8b7d9020", size = 865445, upload-time = "2026-05-09T23:13:12.523Z" },
+    { url = "https://files.pythonhosted.org/packages/1a/89/f05169e8588aac365f35ffc7f3bc3184f095ef4cfded7cfaa3c7fd5dbd89/regex-2026.5.9-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:508f56a89ba9cb26e4168cbc37dbd60a28d82430a9e18ad1d25fe0883c314ca2", size = 912322, upload-time = "2026-05-09T23:13:14.281Z" },
+    { url = "https://files.pythonhosted.org/packages/30/e1/c93444052cf41581f3c884ab3fb5823daf0992f11cd4388d4275ca610558/regex-2026.5.9-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:b6d189041f15691cfa2b6c4290448ec221244d225b3f5fe9e7771b34ffcdf6e2", size = 801269, upload-time = "2026-05-09T23:13:16.569Z" },
+    { url = "https://files.pythonhosted.org/packages/50/fe/0cf96b882f540e62e8b9956599798203d599c44cf4c77917ca27400ff69b/regex-2026.5.9-cp313-cp313-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:e82db382b44d0111b22601c509c89f64434816c9e0eef9d1989cda8cc6ff1c04", size = 777085, upload-time = "2026-05-09T23:13:18.675Z" },
+    { url = "https://files.pythonhosted.org/packages/23/5c/d78d4924e7fc875557b9e9b768423925fdfaac5549d06da7810019a9bd26/regex-2026.5.9-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:2acfb48634f64996b57f90f39afa692ff362162722581921fe92239a59960f3c", size = 785153, upload-time = "2026-05-09T23:13:20.525Z" },
+    { url = "https://files.pythonhosted.org/packages/bf/e0/5214774090e7b4524dcea3e3c4aa74141d43043f8beb49c1599db1c8b53a/regex-2026.5.9-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:d29eebfc9525db68cad3c97eedd7f754fa265aa5cd0cf4f863b2421e1b48fc9f", size = 860164, upload-time = "2026-05-09T23:13:22.263Z" },
+    { url = "https://files.pythonhosted.org/packages/6e/e1/4a57a83350319b1271f0d7a249b8672513ed928b237a741631270de6caea/regex-2026.5.9-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:debb893095e944091c16e641a6e33c1b0f4cb61ab945ec5afbf53ce7068834d8", size = 765731, upload-time = "2026-05-09T23:13:24.277Z" },
+    { url = "https://files.pythonhosted.org/packages/12/f4/499e74a20c156fc75836ee04a72a38d1a063978f600937f9760467beb1b0/regex-2026.5.9-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:d659eee77986549c9ea45b861c7567e44d6287c3dc9a4565478853f7b9fe2ff6", size = 852062, upload-time = "2026-05-09T23:13:26.125Z" },
+    { url = "https://files.pythonhosted.org/packages/5b/92/7eebc0d0a01e78629695f342ba17e0deaff8fb45e79cc0d7b98287da6e3e/regex-2026.5.9-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:2efa205e6d98b24d1f3ab395c11aa15cdf10935bca283d0285e0499c284fba21", size = 789577, upload-time = "2026-05-09T23:13:27.814Z" },
+    { url = "https://files.pythonhosted.org/packages/05/a4/018e71f7d2ad48c1ebe6d3ae0026f9b7cb4802fd15c7cc02fdf724355102/regex-2026.5.9-cp313-cp313-win32.whl", hash = "sha256:f3844f134e834076677dd369976e9f5068679fcb8e50102fdf6b7ac96a3ec127", size = 266691, upload-time = "2026-05-09T23:13:29.549Z" },
+    { url = "https://files.pythonhosted.org/packages/e6/1d/861a93719fb9ee7dbfc3761b3797b7a3e112a5d42c6129459d2d741be9b5/regex-2026.5.9-cp313-cp313-win_amd64.whl", hash = "sha256:3527bb4942d2c14552155406cdedd906567456821848aed1cb4933a391bf5eca", size = 277747, upload-time = "2026-05-09T23:13:31.859Z" },
+    { url = "https://files.pythonhosted.org/packages/d9/c6/0a2436ae4da1ba76e51cb98943c6838a9a721faa40ebe2dce07694ae34e3/regex-2026.5.9-cp313-cp313-win_arm64.whl", hash = "sha256:56a33f191f17d8c417f99945ebdc1e691d3af9605d86ec68c7e54a57e3e17af6", size = 270500, upload-time = "2026-05-09T23:13:33.525Z" },
+    { url = "https://files.pythonhosted.org/packages/e8/e9/d21346f7b60ed58789371358ed66b09d00f832e1bd7c06e55d9da5679882/regex-2026.5.9-cp313-cp313t-macosx_10_13_universal2.whl", hash = "sha256:01f28d868834624c934b8d2e0aa1c8341337e37831f4a012f18a5afcba4cbaf3", size = 494172, upload-time = "2026-05-09T23:13:35.935Z" },
+    { url = "https://files.pythonhosted.org/packages/c4/43/fd1177a2032037c681baecdb3422ee4e1424aec4e4f470ef47793d325274/regex-2026.5.9-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:48036f6374aaa79eb3b754ec29c61d1c6b1606749d705a13f8854fa2539671f6", size = 293952, upload-time = "2026-05-09T23:13:38.307Z" },
+    { url = "https://files.pythonhosted.org/packages/f2/7d/9fbf919768368d3f8a4f6c692cf2aa61e482b2b81ec6a298ace4cbf02480/regex-2026.5.9-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:b96350aa424e79d4fd6b567b344dcbe2b2d6bfc48dfe7717587e1fa6d43da6ff", size = 292314, upload-time = "2026-05-09T23:13:40.353Z" },
+    { url = "https://files.pythonhosted.org/packages/e2/6c/e41bfeecb589716843e7c4df09ba46ff2a42961457afece19059d85caeef/regex-2026.5.9-cp313-cp313t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8f3af7a4903c5c04a11a196a5aa75cdd7dd3f8508132f9fb3259d9f5908e3b88", size = 811681, upload-time = "2026-05-09T23:13:42.543Z" },
+    { url = "https://files.pythonhosted.org/packages/87/83/a5c1c525fba0aa656e88ad0face0b1829788ef4c2fb6b26df58aa1151b84/regex-2026.5.9-cp313-cp313t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:7e87577720152d2caae19fe2baaf1f8d5ca12091e9e229f03915c37d1e4b9178", size = 871135, upload-time = "2026-05-09T23:13:44.326Z" },
+    { url = "https://files.pythonhosted.org/packages/18/d4/80882e799e440dd878b0979cbebf8fa4d54624a332c83037c7a701649e3f/regex-2026.5.9-cp313-cp313t-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:c8b9b9d294cfea3cd19c718ade7cc93492b2c4991abd9a68d0b3477ae6d8e100", size = 917265, upload-time = "2026-05-09T23:13:47.295Z" },
+    { url = "https://files.pythonhosted.org/packages/ae/ff/8db60211e2286e396aad7dc7725356c502bff0901ea05bd6cdc2e1a042b9/regex-2026.5.9-cp313-cp313t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:728d8bfd28a8845c8b6bc5dc7ce010453d206396786c0765c2740cb65f37791e", size = 816311, upload-time = "2026-05-09T23:13:49.885Z" },
+    { url = "https://files.pythonhosted.org/packages/4c/47/742ef579c61730f8d268e5cf1f9ce0e37e2ea041ad0f5644724f2378e463/regex-2026.5.9-cp313-cp313t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:7e30b874d341fac767d7df5a0870540541c2c054b80cfaac116e8d367a8a7ff2", size = 785498, upload-time = "2026-05-09T23:13:52.25Z" },
+    { url = "https://files.pythonhosted.org/packages/7f/ab/cb0999802dcb0fb95b1ab005e8d4163d8afdd67efc2cb6b6630ac13f8cb1/regex-2026.5.9-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:fd190e88a895a8901325fad284a3f74ea52b1da8525b76cc811fa9b1edf0ce2b", size = 801348, upload-time = "2026-05-09T23:13:54.127Z" },
+    { url = "https://files.pythonhosted.org/packages/7d/62/8ca59a24c55bc34d166eefaf3717bd77772f329fdbf984d86581e0a3571c/regex-2026.5.9-cp313-cp313t-musllinux_1_2_ppc64le.whl", hash = "sha256:8e76e8161ad00694cfce6767d5dea860c6391ac5b83e5c3a39661e696f11fc7e", size = 866493, upload-time = "2026-05-09T23:13:56.067Z" },
+    { url = "https://files.pythonhosted.org/packages/8d/3d/30f2ae62cef3278bb5bb821f467277a55fb73f01032cf85997e15e8289a8/regex-2026.5.9-cp313-cp313t-musllinux_1_2_riscv64.whl", hash = "sha256:ddda5340e6c01a293027dd46232fa79eaff1b48058ce7a98f572b6445b088041", size = 772811, upload-time = "2026-05-09T23:13:57.867Z" },
+    { url = "https://files.pythonhosted.org/packages/d8/ae/7d2089bcd78ad0c0161bc684339df50032acb438a7bd3305e7ddb1193cec/regex-2026.5.9-cp313-cp313t-musllinux_1_2_s390x.whl", hash = "sha256:205109e96b3cf5adf8f4cd62bedde9487feb282b9497a3535451e5a24cd706a0", size = 856584, upload-time = "2026-05-09T23:13:59.679Z" },
+    { url = "https://files.pythonhosted.org/packages/a9/29/92ff47f75990131ea4f24ba17819e5a9d141e10819807e09addd73409af6/regex-2026.5.9-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:dfbe4579b9f08036aa7d101d1835437a20783574ac66327e6b29b4018a138081", size = 803453, upload-time = "2026-05-09T23:14:01.978Z" },
+    { url = "https://files.pythonhosted.org/packages/04/99/eff29f1037dcab36702c9ee5d6858cf1ce2336ea8ea2987f64245b99ea5e/regex-2026.5.9-cp313-cp313t-win32.whl", hash = "sha256:ed2c9e8068b614c574d8d30e543d617cf5379b0535d46f97ef00e904745a08b5", size = 269951, upload-time = "2026-05-09T23:14:03.661Z" },
+    { url = "https://files.pythonhosted.org/packages/0e/9d/8870b8981d27b22cda77bb26a5ac7ebfa9c7d9e0dea195a834a82380e748/regex-2026.5.9-cp313-cp313t-win_amd64.whl", hash = "sha256:b46b0f094dc1d3b90356c85a0bd2c9bafc4a6a190b9d6f8ddd5a033b6e088ed4", size = 281240, upload-time = "2026-05-09T23:14:05.56Z" },
+    { url = "https://files.pythonhosted.org/packages/72/b1/3379415e8f135c13ac551353397cc4fe97b4978f3cac73c5fcbcded548b8/regex-2026.5.9-cp313-cp313t-win_arm64.whl", hash = "sha256:872acc074bd29ffc9913ecdfedf6ea77502312ca44a4aa0d3779089c6069d8de", size = 272383, upload-time = "2026-05-09T23:14:07.843Z" },
+    { url = "https://files.pythonhosted.org/packages/13/3e/9c3cd292d8808b3645a2ce517e200179b6d0e903f176300bd8b542e14de5/regex-2026.5.9-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:1bd7587a2948b4085195d5a3374eaf4a425dc3e55784c038175355ecf3bbbf8a", size = 490376, upload-time = "2026-05-09T23:14:09.64Z" },
+    { url = "https://files.pythonhosted.org/packages/60/70/d43ee8a2ca0a8b68d167f21658b85520ac0574617c7f320367c5047f7556/regex-2026.5.9-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:dea2e88e1cce4522496cce630e11e67b98b7076620bc4336c3f674bc21a375f4", size = 291964, upload-time = "2026-05-09T23:14:11.424Z" },
+    { url = "https://files.pythonhosted.org/packages/21/91/9d50b433828d8e74196904e168a43abf1e6e88b2a15d47ed742456720c37/regex-2026.5.9-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:2099f7e7ff7b6aa3192312650a56e91cc091e49d50b04e4f6f8b6e28b3b27f1c", size = 289682, upload-time = "2026-05-09T23:14:13.123Z" },
+    { url = "https://files.pythonhosted.org/packages/3e/d2/b835e3cafbb9d977736912436259ff551d60919f7d7b3d37d46659c63564/regex-2026.5.9-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ecd353045824e4477562a2ac718c25799cdaaa41f7aa925a806a8a3e6848a5b9", size = 796996, upload-time = "2026-05-09T23:14:14.923Z" },
+    { url = "https://files.pythonhosted.org/packages/2c/a6/9f992d00019166b9de01c546dd4549bc679f2a68df11b877740b0760b7c2/regex-2026.5.9-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:65c8c8c37377794bd5b2f3ebe51919042bf17aec802e23c833d89782ed0c78af", size = 866089, upload-time = "2026-05-09T23:14:17.757Z" },
+    { url = "https://files.pythonhosted.org/packages/e0/08/4d32af657e049b19cb62b02e46e38fe1518797bfb2203ee93a510b21b0dc/regex-2026.5.9-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:5b73ab8afcf66c622db143d1c6fda4e58e4d537ee4f125229ad47b1ab80f34c0", size = 911530, upload-time = "2026-05-09T23:14:20.353Z" },
+    { url = "https://files.pythonhosted.org/packages/d9/27/2af43dd1dc201d1fecefda64a45f4ad0995855b92724f795a777b402ee69/regex-2026.5.9-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0de5cf193997384ed2ca6f1cd4f78055b255d93d82d5a8cd6ba0d11c10b167e4", size = 800643, upload-time = "2026-05-09T23:14:22.265Z" },
+    { url = "https://files.pythonhosted.org/packages/a4/dd/23a249047013b5321d4a60c4d2437462086f601b061776a525e5fba2a59f/regex-2026.5.9-cp314-cp314-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:d641a8c9a61618047796d572a39a79b26167b0411d2c3031937b2fe2d081e2cf", size = 777223, upload-time = "2026-05-09T23:14:24.179Z" },
+    { url = "https://files.pythonhosted.org/packages/94/6a/e85ed9538cd19586d0465076a4578a12e093ce776d15f3f8ce92733a8dd6/regex-2026.5.9-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:24b2355ef5cc9aa5b8f07d17704face1c166fdcc2290fa7bd6e6c925655a8346", size = 785760, upload-time = "2026-05-09T23:14:26.065Z" },
+    { url = "https://files.pythonhosted.org/packages/2a/c4/f25473209438638e947c55f9156fd8f236f74169229028cc99116380868e/regex-2026.5.9-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:a24852d3c29ad9e47593593d8a247c44ccc3d0548ef12c822d6ed0810affe676", size = 860891, upload-time = "2026-05-09T23:14:28.17Z" },
+    { url = "https://files.pythonhosted.org/packages/f9/f7/f4f86e3c74419c37370e91f150ae0c2ef7d34b2e0e4cdd5da046a02e4022/regex-2026.5.9-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:916714069da19329ef7de197dcbc77bb3104145c7c2c864dbfbe318f46b88b14", size = 765891, upload-time = "2026-05-09T23:14:30.06Z" },
+    { url = "https://files.pythonhosted.org/packages/26/70/704d8e13765939146b1cd0ef4e2feb71d7929727d2290f026eed10095955/regex-2026.5.9-cp314-cp314-musllinux_1_2_s390x.whl", hash = "sha256:fa411799ca8da32a8d38d020a88faa5b6f91657d284761352940ecf9f7c3bbdd", size = 851380, upload-time = "2026-05-09T23:14:32.123Z" },
+    { url = "https://files.pythonhosted.org/packages/26/29/1a13582a8460038edc38e49f64ceb0dd7c60f5caba77571f4bf6601965d9/regex-2026.5.9-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:1e6da47d679b7010ef27556b6e0f99771b744936db1792a10ceac6547ae1503e", size = 789350, upload-time = "2026-05-09T23:14:34.799Z" },
+    { url = "https://files.pythonhosted.org/packages/73/56/3dcafe34fc72e271d62ad9a291801e88a1457bb251c132f15fcc2e5aad1a/regex-2026.5.9-cp314-cp314-win32.whl", hash = "sha256:98bd73080e8756255137e1bd3f3f00295bbc5aa383c0e0f973920e9134d7c4ad", size = 272130, upload-time = "2026-05-09T23:14:36.729Z" },
+    { url = "https://files.pythonhosted.org/packages/d0/9c/02eebf0be95efe416c664db7fb8b6b05b7a0b06a7544f2884f2558b0526f/regex-2026.5.9-cp314-cp314-win_amd64.whl", hash = "sha256:ff8d372ac2acdc048d1c19916f27ee61bc5722728458ba6ca5052f2c72d51763", size = 280999, upload-time = "2026-05-09T23:14:39.126Z" },
+    { url = "https://files.pythonhosted.org/packages/70/5a/1dd1abee76cb7a846a0bcf42fdc87e5720c3c33c24f3e37814310a513d9f/regex-2026.5.9-cp314-cp314-win_arm64.whl", hash = "sha256:e1d93bf647916292e8edcec150c07ddf3dc50179ccaf770c04a7f9e452155372", size = 273500, upload-time = "2026-05-09T23:14:41.059Z" },
+    { url = "https://files.pythonhosted.org/packages/86/c1/c5f619b0057a7965cb78ec559c1d7a45ce8c99a35bea95483d64959a93d9/regex-2026.5.9-cp314-cp314t-macosx_10_13_universal2.whl", hash = "sha256:83d0ee4a57d1c87cb549e195ec300b8f0ec3a82eba66d835e4e2ed8634fe4499", size = 494269, upload-time = "2026-05-09T23:14:42.869Z" },
+    { url = "https://files.pythonhosted.org/packages/05/2c/5d01f1aee33de4bbe60c8452945bfc8477ca7c5ae4450f6bfe711036cb36/regex-2026.5.9-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:d3d7eb5c9a7f6df82ed3cfac9beb93882a5cbcb5b8b157b56cb2b3b276574ac1", size = 293954, upload-time = "2026-05-09T23:14:44.822Z" },
+    { url = "https://files.pythonhosted.org/packages/7a/fe/e8988b2ae2108c6ef71bd4aa8d87fbe257976dd0810e826cd75f701c68b6/regex-2026.5.9-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:075160bf16658e16d35233300b8453aac25de4cbea808d22348b6979668e924d", size = 292405, upload-time = "2026-05-09T23:14:47.211Z" },
+    { url = "https://files.pythonhosted.org/packages/79/34/d2b0937faa7859263f7f0a3c6b103a1296306be6952dc173d0154e9a2f49/regex-2026.5.9-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:45375819235558a4ff1c4971dc32881f022613abdb180128f5cb4768c1765a1c", size = 811855, upload-time = "2026-05-09T23:14:49.21Z" },
+    { url = "https://files.pythonhosted.org/packages/80/fe/daf53a47457a8486db66c66c01ceb9c2303eecee3f87197f1e77eb1a736d/regex-2026.5.9-cp314-cp314t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:ead4b163ac30a29574510cd4b3e2e985ac5290c05fc7095557d6a5f403fc31b5", size = 871189, upload-time = "2026-05-09T23:14:51.555Z" },
+    { url = "https://files.pythonhosted.org/packages/1c/75/058fc4470cbfbf57d800aff1a0022b929a3f9fa553ee10a0cdf2070eb31f/regex-2026.5.9-cp314-cp314t-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:8c6e4218fbdfbcd4f6c19efca40930d24a621bf4b48cb76bc6640543bd28ef20", size = 917485, upload-time = "2026-05-09T23:14:53.633Z" },
+    { url = "https://files.pythonhosted.org/packages/88/e7/179cfda3a28bc843b5c6cfe7f79f23489c791ed95f151083803660878432/regex-2026.5.9-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:6351571c8a42b505eb555c0dc47d740d0fb66977dc142919eea6f4325b7c56a0", size = 816369, upload-time = "2026-05-09T23:14:56.198Z" },
+    { url = "https://files.pythonhosted.org/packages/41/90/6f0cc422071688266d344fca8462d787cba0a2c144acb25721f9a61ec265/regex-2026.5.9-cp314-cp314t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:002205cafd2a9e78c6290c7d1df277bf3277b3b7a30e0b4bb0dac2e2e3f7cb2d", size = 785869, upload-time = "2026-05-09T23:14:58.602Z" },
+    { url = "https://files.pythonhosted.org/packages/02/67/a31f1760f09c27b251ef39e9beb541f462cf977381d067faa764c2c0e393/regex-2026.5.9-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:8abd33fef90b2a9efac5557d6033ca82d1195ed3a15fea5af15ba7b463c6a63b", size = 801427, upload-time = "2026-05-09T23:15:00.642Z" },
+    { url = "https://files.pythonhosted.org/packages/e3/c4/1a80654597b6bc1e1ea0494824c31200e8a956abe290afae9b19a166a148/regex-2026.5.9-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:31037c82eccb44b7ea2e9e221d7c01429430e989a1f4b91ea5a855f6017b509a", size = 866482, upload-time = "2026-05-09T23:15:03.384Z" },
+    { url = "https://files.pythonhosted.org/packages/d1/11/960724e06482c08466ff5611e242e86f80062949cdf6b4b9cc317b9dd93d/regex-2026.5.9-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:5604dfd046dc37eca90250fc3be938b076c8059fa772ac0ed6f499b0f0fb0415", size = 773022, upload-time = "2026-05-09T23:15:05.625Z" },
+    { url = "https://files.pythonhosted.org/packages/50/a8/a9979c3e7918280e93159ebcab5ef1a65116dd4f3bd6091be0eae4a126e8/regex-2026.5.9-cp314-cp314t-musllinux_1_2_s390x.whl", hash = "sha256:0e1b1b4e496afbb24f4a62aba855ee4f88f25578927697b340702e48c9ee6bc2", size = 856642, upload-time = "2026-05-09T23:15:07.966Z" },
+    { url = "https://files.pythonhosted.org/packages/fe/d4/a9b732f2f0072c0ab12227483abb24fffcb9f73f8a2b203df0a6d0434735/regex-2026.5.9-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:be3372b9df6ddecff6486d37e19095a7b4973137caf5512407a89f4455361f41", size = 803552, upload-time = "2026-05-09T23:15:10.215Z" },
+    { url = "https://files.pythonhosted.org/packages/d5/fe/1b3113817447a1d4155e4ac76d2e072f42c0bcba2f43fa8a0e756ea2cd91/regex-2026.5.9-cp314-cp314t-win32.whl", hash = "sha256:3ddd90103f9e5c471c49c7852ecc1fe27c7e45eb99e977aefe7caa4e779f4f58", size = 275746, upload-time = "2026-05-09T23:15:12.609Z" },
+    { url = "https://files.pythonhosted.org/packages/92/73/93d42045302636c91f2e5ef588b65b84b01428f28ec77de256b1dfdfbe5c/regex-2026.5.9-cp314-cp314t-win_amd64.whl", hash = "sha256:ca518ed29c46eecba6010b15f1b9a479314d2de409536e71b6a13aa04e3b8a77", size = 285685, upload-time = "2026-05-09T23:15:15.086Z" },
+    { url = "https://files.pythonhosted.org/packages/da/80/35b4c33c804a165a7f55289afda3ea9e3eb6d15800341a2d66455c0f1f30/regex-2026.5.9-cp314-cp314t-win_arm64.whl", hash = "sha256:5e41809d2683fcde7d5a8c87a6567ba1fb1ce0de9f31bff578de00a4b2d76daa", size = 275713, upload-time = "2026-05-09T23:15:16.98Z" },
 ]
 
 [[package]]
@@ -6371,6 +6425,10 @@ evals = [
 lit = [
     { name = "lit-nlp" },
 ]
+qwen35 = [
+    { name = "packaging" },
+    { name = "transformers" },
+]
 
 [package.dev-dependencies]
 demo = [
@@ -6432,18 +6490,20 @@ requires-dist = [
     { name = "numpy", marker = "python_full_version >= '3.10' and python_full_version < '3.12'", specifier = ">=1.24" },
     { name = "numpy", marker = "python_full_version == '3.12.*'", specifier = ">=1.26" },
     { name = "pandas", specifier = ">=1.1.5" },
+    { name = "packaging", marker = "extra == 'qwen35'", specifier = ">=23.0" },
     { name = "protobuf", specifier = ">=3.20.0" },
     { name = "rich", specifier = ">=12.6.0" },
     { name = "sentencepiece" },
     { name = "torch", specifier = ">=2.6" },
     { name = "tqdm", specifier = ">=4.64.1" },
     { name = "transformers", specifier = ">=4.56" },
+    { name = "transformers", marker = "extra == 'qwen35'", specifier = ">=5.2.0" },
     { name = "transformers-stream-generator", specifier = ">=0.0.5,<0.1" },
     { name = "typeguard", specifier = ">=4.2,<5" },
     { name = "typing-extensions" },
     { name = "wandb", specifier = ">=0.13.5" },
 ]
-provides-extras = ["evals", "lit"]
+provides-extras = ["evals", "lit", "qwen35"]
 
 [package.metadata.requires-dev]
 demo = [
@@ -6490,10 +6550,9 @@ quantization = [
 
 [[package]]
 name = "transformers"
-version = "5.0.0"
+version = "5.8.1"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
-    { name = "filelock" },
     { name = "huggingface-hub" },
     { name = "numpy" },
     { name = "packaging" },
@@ -6502,11 +6561,11 @@ dependencies = [
     { name = "safetensors" },
     { name = "tokenizers" },
     { name = "tqdm" },
-    { name = "typer-slim" },
+    { name = "typer" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/bc/79/845941711811789c85fb7e2599cea425a14a07eda40f50896b9d3fda7492/transformers-5.0.0.tar.gz", hash = "sha256:5f5634efed6cf76ad068cc5834c7adbc32db78bbd6211fb70df2325a9c37dec8", size = 8424830, upload-time = "2026-01-26T10:46:46.813Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/e7/e6/4134ea2fbea322cddc7ffc94a0d8ee47fe32ce8e876b320cd37d88edfc4d/transformers-5.8.1.tar.gz", hash = "sha256:4dd5b6de4105725104d84fd6abd74b305f4debfc251b38c648ee5dd087cf543b", size = 8532019, upload-time = "2026-05-13T03:21:57.234Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/52/f3/ac976fa8e305c9e49772527e09fbdc27cc6831b8a2f6b6063406626be5dd/transformers-5.0.0-py3-none-any.whl", hash = "sha256:587086f249ce64c817213cf36afdb318d087f790723e9b3d4500b97832afd52d", size = 10142091, upload-time = "2026-01-26T10:46:43.88Z" },
+    { url = "https://files.pythonhosted.org/packages/fc/b1/8be7e7ef0b5200491312201918b6125ef9c9df9dd0f0240ccef9ac824e6b/transformers-5.8.1-py3-none-any.whl", hash = "sha256:5340fb95962162cdfdae5cc91d7f8fedd92ed75216c1154c5e1f590fcf56dd0e", size = 10632882, upload-time = "2026-05-13T03:21:52.876Z" },
 ]
 
 [[package]]
@@ -6566,30 +6625,17 @@ datetime = [
 
 [[package]]
 name = "typer"
-version = "0.16.0"
+version = "0.25.1"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
+    { name = "annotated-doc" },
     { name = "click" },
     { name = "rich" },
     { name = "shellingham" },
-    { name = "typing-extensions" },
-]
-sdist = { url = "https://files.pythonhosted.org/packages/c5/8c/7d682431efca5fd290017663ea4588bf6f2c6aad085c7f108c5dbc316e70/typer-0.16.0.tar.gz", hash = "sha256:af377ffaee1dbe37ae9440cb4e8f11686ea5ce4e9bae01b84ae7c63b87f1dd3b", size = 102625, upload-time = "2025-05-26T14:30:31.824Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/76/42/3efaf858001d2c2913de7f354563e3a3a2f0decae3efe98427125a8f441e/typer-0.16.0-py3-none-any.whl", hash = "sha256:1f79bed11d4d02d4310e3c1b7ba594183bcedb0ac73b27a9e5f28f6fb5b98855", size = 46317, upload-time = "2025-05-26T14:30:30.523Z" },
-]
-
-[[package]]
-name = "typer-slim"
-version = "0.21.1"
-source = { registry = "https://pypi.org/simple" }
-dependencies = [
-    { name = "click" },
-    { name = "typing-extensions" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/17/d4/064570dec6358aa9049d4708e4a10407d74c99258f8b2136bb8702303f1a/typer_slim-0.21.1.tar.gz", hash = "sha256:73495dd08c2d0940d611c5a8c04e91c2a0a98600cbd4ee19192255a233b6dbfd", size = 110478, upload-time = "2026-01-06T11:21:11.176Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/e4/51/9aed62104cea109b820bbd6c14245af756112017d309da813ef107d42e7e/typer-0.25.1.tar.gz", hash = "sha256:9616eb8853a09ffeabab1698952f33c6f29ffdbceb4eaeecf571880e8d7664cc", size = 122276, upload-time = "2026-04-30T19:32:16.964Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/c8/0a/4aca634faf693e33004796b6cee0ae2e1dba375a800c16ab8d3eff4bb800/typer_slim-0.21.1-py3-none-any.whl", hash = "sha256:6e6c31047f171ac93cc5a973c9e617dbc5ab2bddc4d0a3135dc161b4e2020e0d", size = 47444, upload-time = "2026-01-06T11:21:12.441Z" },
+    { url = "https://files.pythonhosted.org/packages/3f/f9/2b3ff4e56e5fa7debfaf9eb135d0da96f3e9a1d5b27222223c7296336e5f/typer-0.25.1-py3-none-any.whl", hash = "sha256:75caa44ed46a03fb2dab8808753ffacdbfea88495e74c85a28c5eefcf5f39c89", size = 58409, upload-time = "2026-04-30T19:32:18.271Z" },
 ]
 
 [[package]]