Skip to content

[Proposal] Add basic support for Gemma 3n (E2B & E4B) models #953

@lmmontoya-ai

Description

@lmmontoya-ai

Proposal

Add Gemma 3n support, both E2B and E4B checkpoints. Initially, focusing only to the "text-only mode". The loader should ignore the vision tower and audio encoder so that only the core causal-decoder is used.

Motivation

Gemma 3n offers state-of-the-art performance for the compute it requires, making it ideal for running mechanistic interpretability experiments on consumer hardware. Its unique architecture: featuring sparse updates, low-rank residuals, and nested layers; introduces novel mechanisms that are worth exploring. Gemma 3n could allow the M.I community to study a pair powerful models and enable reproducible interpretability research without the need for expensive compute.

Pitch

Add support for Gemma 3n models (E2B and E4B), starting with text-only inputs, by bypassing the vision and audio components. This would let users without high-end compute do Mechanistic Interpretability experiments on a high-efficiency, high-performance model with novel architectural features such as sparse alternating updates (AltUp), low-rank residual augmentation (LAuReL), and nested sub-models (MatFormer, E4B).

Additional context

HF checkpoints: google/gemma-3n-2b-it, google/gemma-3n-4b-it

Key obstacles to be aware of (no solutions proposed here):

Nested‐layer design (Matryoshka): E4B contains the E2B sub-model.

Extra per-block modules: Each transformer block adds AltUp sparsity gates, LAuReL low-rank residuals, and Per-Layer Embeddings (PLE).

Memory optimizations: The PLE use caching to offload much of its embedding parameters to CPU. So, many weights might not reside on the GPU at runtime.

Mixed local / global attention.

Huge vocab and reserved multimodal IDs.

Multimodality: Loading text-only mode means safely bypassing vision and audio parameters while keeping their references (could this be done?).

Checklist

  • I have checked that there is no similar issue in the repo (required)

NOTE: This is my first issue ever on a open source project, any observation or recommendation is welcomed!

Metadata

Metadata

Assignees

No one assigned

    Labels

    complexity-highVery complicated changes for people to address who are quite familiar with the code

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions