Skip to content

Add CMakePresets for target micro arch#1348

Open
AntoinePrv wants to merge 9 commits into
xtensor-stack:masterfrom
AntoinePrv:cmake-presets
Open

Add CMakePresets for target micro arch#1348
AntoinePrv wants to merge 9 commits into
xtensor-stack:masterfrom
AntoinePrv:cmake-presets

Conversation

@AntoinePrv
Copy link
Copy Markdown
Contributor

@AntoinePrv AntoinePrv commented May 13, 2026

I've taken the direction of explicit flags such as -mavx -mno-avx2.
This is IMHO less error prone and more accurate that using architecture name such as haswell.
The main difference is that this does not add other feature flags or change the -mtune model.
For a test setting accuracy is more important IMHO.

Comment thread .github/workflows/linux.yml Outdated
@serge-sans-paille
Copy link
Copy Markdown
Contributor

I really like your approach and will eagerly merge it once it validates \o/

@AntoinePrv
Copy link
Copy Markdown
Contributor Author

I've only kept the micro architecture target in CMakePresets.txt because combining with (debug/release) / (xtl on/off)... results in a combinatorial explosion of presets for which there is currently no support.
Another shortcoming is that we cannot dispatch here based on compiler for MSVC flags. We can do it based on OS but it is not quite the same.

I have ongoing work to actually do the same as these presets at the CMake level, with a function that can be made available to users to help in the tooling for dynamic dispatch (our current solution in Arrow is very verbose).
In this case, we'd need to also define a safe -march baseline. The reason is the code in these translation units might also include non SIMD code (this is sometimes the case in Arrow). In this case, with very advanced instruction sets, we're leaving perf on the table by having a x86-64 baseline. But what should be a reasonable baseline for dynamic dispatching to for example avx2?

  • haswell (first avx2) also has fma3 and bmi2
  • -march=haswell -mno-fma3 -mno-bmi2 if that is a thing?
  • Or go further back? sandybridge (first avx)? nehalem (first sse4.2)

@AntoinePrv AntoinePrv force-pushed the cmake-presets branch 2 times, most recently from a7e66a6 to d87c148 Compare May 19, 2026 14:13
@AntoinePrv
Copy link
Copy Markdown
Contributor Author

@serge-sans-paille this is in a ready state, but I am not fully happy with it.

Getting into AVX512, and AVX512-256, the combinatorial explosion of possibilities start to show again.
Inheritance of flags from other settings is also not possible.

This reinforce my belief that I should keep on with the work to do it in CMake (that could also be installed for our users to improve our dynamic dispatch tooling), and also homogenized with the test TARGET_ARCH var.

This PR is not completely worthless though. For example we now have the possibility to really test with avx512f, which was not the case before because no Intel arch is limited to the f feature only.

What do you think? Should we give this some mileage before I get the time to work on a CMake solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants