Commit b7c2e2a
perf: skip Metal sync after QKV matmul during generation
The synchronize() call after the QKV projection was running on every
token including generation (seq_len=1). Since generation now uses the
fused SDPA kernel (few commands), the sync is unnecessary and adds
~4ms per full attention layer × 6 layers = ~24ms overhead per token.
Benchmark (M3 Pro, Qwen3.5-0.8B, 50 tokens):
- Before: 15.2 tok/s
- After: 16.1 tok/s (+5.9%)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 64b66e5 commit b7c2e2a
1 file changed
Lines changed: 5 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
138 | 138 | | |
139 | 139 | | |
140 | 140 | | |
141 | | - | |
142 | | - | |
143 | | - | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
144 | 146 | | |
145 | 147 | | |
146 | 148 | | |
| |||
0 commit comments