Skip to content

Implementation of tiled attention with bf16 and circular buffers whic…

df162ea
Select commit
Loading
Failed to load commit list.
Merged

Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models. #839

Implementation of tiled attention with bf16 and circular buffers whic…
df162ea
Select commit
Loading
Failed to load commit list.