[SYSTEMDS-3920] Vector API Implementation for dense codegen primitives (Divisions, Aggregations, Comparisons, MultiplyAdd) + benchmarks #2428

JulianJuelg · 2026-01-30T22:57:44Z

This PR adds a Java Vector API implementation for dense codegen primitives in the following groups:

Aggregation
Division
Comparison
Multiply-add (remaining)

The new vectorized implementations were benchmarked against the previous scalar-loop versions (see results below) with JMH microbenchmarks and a standalone Java benchmark suite included in this PR. In most cases, both harnesses show the same trend. In caseswhere they differ slightly, JMH is used as the primary signal due to lower volatility.

For each primitive, I compared the Vector API version to the existing scalar loop:

If performance was equal, or better, I replaced the scalar loop with the vectorized implementation.
If the Vector API version was slower, I kept the scalar implementation as the default and left the vectorized version in the codebase for reference

Benchmark setup
JDK version : 21
JMH version: 1.37
OS: macOS
Machine: (Apple M2/M, 16 GB RAM, 128-bit vector width/ SIMD)
Input size (double arrays): 1,000,000 elements
Warmup time: 1s per primitive
Measurement: 1 Iteration
JMH params: 2 Forks

Note: These benchmarks were run with a 128-bit SIMD vector width, which is only 2 lanes for doubles. On production deployments with wider SIMD (e.g., 256-bit or 512-bit where available), the vectorized implementations are expected to provide equal or better speedups due to increased lane-level parallelism.

Primitive Function	ns/op (JMH)	JMH Test: Speedup with Vector API	Java Test: Speedup with Vector API	Replaced
vectDivAdd	231671	1.066	1.887	Yes
vectDivAdd2	218818	1.066	1.686	Yes
vectDivWrite	359339	0.687	1.489	No
vectDivWrite2	343183	0.7215	0.717	No
vectDivWrite3	535898	0.7821	0.603	No
rowMaxsVectMult	298328	1.006	1.346	Yes
rowMaxsVectMult_aix	738767	0.115	0.077	No
vectSum	142065	0.322	0.565	No
vectMax	596046	2.002	1.933	Yes
vectCountnnz	297805	1.594	1.538	Yes
vectEqualAdd	427437	1.959	2.077	Yes
vectEqualWrite2	414717	1.183	0.801	Yes
vectEqualWrite	415329	1.189	1.402	Yes
vectGreaterAdd	427981	1.936	2.114	Yes
vectGreaterWrite2	552023	0.588	0.919	No
vectGreaterWrite	458332	1.309	0.927	Yes
vectLessAdd	531844	2.433	2.052	Yes
vectLessWrite2	545457	1.011	0.951	Yes
vectLessWrite	414025	1.203	1.039	Yes
vectLessequalAdd	426307	1.960	2.052	Yes
vectLessequalWrite2	540476	1.014	0.962	Yes
vectLessequalWrite	414514	1.181	0.953	Yes
vectMin	589668	2.000	1.996	Yes
vectMult2Add	228636	1.052	1.284	Yes
vectMult2Write	377074	2.136	1.375	Yes
vectNotequalAdd	424749	1.945	1.643	Yes
vectNotequalWrite2	566433	0.714	0.821	No
vectNotequalWrite	417206	1.203	0.941	Yes

…div, c) aggregations, d) comparisons

…add all primitives implementations to benchmarking suite

JulianJuelg added 4 commits January 27, 2026 21:07

first vector api implementaions + benchmarking suit

9557e0e

all vector api implementation of dense primitives a) multiplyAdd, b) …

2c6f30d

…div, c) aggregations, d) comparisons

Merge remote-tracking branch 'upstream/main' into feature/vector-api

8852d53

Replace codegen primitives with vector api implementation if faster; …

a881e55

…add all primitives implementations to benchmarking suite

github-project-automation bot added this to SystemDS PR Queue Jan 30, 2026

github-project-automation bot moved this to In Progress in SystemDS PR Queue Jan 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYSTEMDS-3920] Vector API Implementation for dense codegen primitives (Divisions, Aggregations, Comparisons, MultiplyAdd) + benchmarks #2428

[SYSTEMDS-3920] Vector API Implementation for dense codegen primitives (Divisions, Aggregations, Comparisons, MultiplyAdd) + benchmarks #2428

JulianJuelg commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[SYSTEMDS-3920] Vector API Implementation for dense codegen primitives (Divisions, Aggregations, Comparisons, MultiplyAdd) + benchmarks #2428

Are you sure you want to change the base?

[SYSTEMDS-3920] Vector API Implementation for dense codegen primitives (Divisions, Aggregations, Comparisons, MultiplyAdd) + benchmarks #2428

Conversation

JulianJuelg commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant