Skip to content

Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models.#839

Merged
copybara-service[bot] merged 1 commit intodevfrom
test_864904207
Feb 24, 2026
Merged

Conversation

@copybara-service
Copy link

@copybara-service copybara-service bot commented Feb 23, 2026

Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models.
It also supports better parallelism for small batch sizes / small models.
It also is able to utilize VDPBF16PS for nice 2x improvement on avx512

@copybara-service copybara-service bot force-pushed the test_864904207 branch 6 times, most recently from acf7894 to 239d792 Compare February 24, 2026 10:51
@copybara-service copybara-service bot changed the title Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models. Feb 24, 2026
…h reduces memory requirements by 4x on longer context on gemma models.

It also supports better parallelism for small batch sizes / small models.
It also is able to utilize VDPBF16PS for nice 2x improvement on avx512

PiperOrigin-RevId: 874517319
@copybara-service copybara-service bot merged commit df162ea into dev Feb 24, 2026
@copybara-service copybara-service bot deleted the test_864904207 branch February 24, 2026 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants