[Proposal] Add basic support for Gemma 3n (E2B & E4B) models

### Proposal 

Add Gemma 3n support, both E2B and E4B checkpoints. Initially, focusing only to the "text-only mode". The loader should ignore the vision tower and audio encoder so that only the core causal-decoder is used.

### Motivation

Gemma 3n offers state-of-the-art performance for the compute it requires, making it ideal for running mechanistic interpretability experiments on consumer hardware. Its unique architecture: featuring sparse updates, low-rank residuals, and nested layers; introduces novel mechanisms that are worth exploring. Gemma 3n could allow the M.I community to study a pair powerful models and enable reproducible interpretability research without the need for expensive compute.

### Pitch

Add support for Gemma 3n models (E2B and E4B), starting with text-only inputs, by bypassing the vision and audio components. This would let users without high-end compute do Mechanistic Interpretability experiments on a high-efficiency, high-performance model with novel architectural features such as sparse alternating updates (AltUp), low-rank residual augmentation (LAuReL), and nested sub-models (MatFormer, E4B).

### Additional context

HF checkpoints: [google/gemma-3n-2b-it](https://huggingface.co/google/gemma-3n-E2B-it), [google/gemma-3n-4b-it](https://huggingface.co/google/gemma-3n-E4B-it)

Key obstacles to be aware of (no solutions proposed here):

Nested‐layer design (Matryoshka): E4B contains the E2B sub-model. 

Extra per-block modules: Each transformer block adds AltUp sparsity gates, LAuReL low-rank residuals, and Per-Layer Embeddings (PLE).

Memory optimizations: The PLE use caching to offload much of its embedding parameters to CPU. So, many weights might not reside on the GPU at runtime.

Mixed local / global attention.

Huge vocab and reserved multimodal IDs.

Multimodality: Loading text-only mode means safely bypassing vision and audio parameters while keeping their references (could this be done?).

### Checklist

- [x] I have checked that there is no similar [issue](https://github.com/TransformerLensOrg/Transformerlens/issues) in the repo (**required**)


NOTE: This is my first issue ever on a open source project, any observation or recommendation is welcomed!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Add basic support for Gemma 3n (E2B & E4B) models #953

Proposal

Motivation

Pitch

Additional context

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Proposal] Add basic support for Gemma 3n (E2B & E4B) models #953

Description

Proposal

Motivation

Pitch

Additional context

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions