Optimize Zen 4 GEMM macro block sizes (P, Q, R) by Vasudeva-bit · Pull Request #5868 · OpenMathLib/OpenBLAS

Vasudeva-bit · 2026-06-25T13:48:40Z

Zen 4 Cache Boundary Optimisations (uses CooperLake Micro Kernels) #5837

Description:
This PR introduces targeted cache geometry overrides (P, Q, R parameters) for the Zen 4 architecture. Parameters were derived using Bayesian Optimization (Optuna) to perfectly align matrix panel footprints with the physical 1MB L2 and 32MB L3 limits of the Zen 4 CCX.

1. Successful Optimisations (Implemented)

These standard GEMM variants demonstrated absolute GFLOPs scaling over the baseline OpenBLAS generic fallbacks. These are the configurations included in this patch.

Variant	OpenBLAS Baseline (GFLOPs)	BO Tuned Parameters `(P, Q, R)`	BO Tuned Performance (GFLOPs)	Net Improvement
SGEMM	2880.00	`384, 512, 5936`	3080.71	+ 6.9%
DGEMM	1293.73	`512, 512, 2288`	1511.87	+ 16.8%
CGEMM	2981.33	`160, 480, 528`	3143.95	+ 5.4%
ZGEMM	1288.60	`176, 256, 1520`	1459.45	+ 13.2%

2. Discarded

Variant	OpenBLAS Baseline (GFLOPs)	BO Tuned Parameters `(P, Q, R)`	BO Tuned Performance (GFLOPs)
SBGEMM	5318.71	`368, 712, 4544`	5267.25
CGEMM_3M	2606.98	`168, 376, 3920`	2655.71
ZGEMM_3M	960.02	`192, 256, 1904`	889.82

Optimize Zen 4 GEMM macro block sizes (P, Q, R)

753604c

Vasudeva-bit marked this pull request as draft June 25, 2026 13:50

Vasudeva-bit changed the title ~~Optimize Zen 4 GEMM macro block sizes (P, Q, R) - #5837~~ Optimize Zen 4 GEMM macro block sizes (P, Q, R) Jun 25, 2026

Vasudeva-bit marked this pull request as ready for review June 25, 2026 13:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize Zen 4 GEMM macro block sizes (P, Q, R)#5868

Optimize Zen 4 GEMM macro block sizes (P, Q, R)#5868
Vasudeva-bit wants to merge 1 commit into
OpenMathLib:developfrom
Vasudeva-bit:macTuneZEN4

Vasudeva-bit commented Jun 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Vasudeva-bit commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Zen 4 Cache Boundary Optimisations (uses CooperLake Micro Kernels) #5837

1. Successful Optimisations (Implemented)

2. Discarded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Vasudeva-bit commented Jun 25, 2026 •

edited

Loading