-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
I have a nested loop that computes data in two dimensions, like this:
for (int i = satellites_start; i < satellites_end; i++)
{
for (int j = dates_start; j < dates_end; j++)
{
// calculations go here
}
}I compile with -O3 -msimd128 -Rpass=loop-vectorize -Rpass-missed=loop-vectorize -Rpass-analysis=loop-vectorize options. The inner loop is automatically vectorized, but not the outer loop (as expected by default).
I have found that in LLVM it is possible to enable nested loop vectorization by using "VPlan native path" instead of the default "standard path" (if I understand correctly) and I want to try. Here's what I did:
- I added
-mllvm -enable-vplan-native-pathto options. Emscripten doesn't error so I assume the option is passed to LLVM. - I added
#pragma clang loop vectorize(enable) vectorize_width(2)above the outer loop (otherwise it still only vectorizes the inner loop).
This is what output I get:
src-cpp/common.cpp:653:3: remark: loop not vectorized: loop control flow is not understood by vectorizer [-Rpass-analysis]
653 | for (int i = satellites_start; i < satellites_end; i++)
| ^
src-cpp/common.cpp:653:3: remark: loop not vectorized: Unsupported outer loop [-Rpass-analysis]
src-cpp/common.cpp:653:3: remark: loop not vectorized (Force=true, Vector Width=2) [-Rpass-missed=loop-vectorize]
src-cpp/common.cpp:653:3: warning: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled
or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
However I tried this code in someone's example and it was successfully vectorized:
void example(int n, int a[1024][1024], int b[1024][1024])
{
#pragma clang loop vectorize(enable) vectorize_width(4)
for (int i = 1; i < n; i++) {
for (int j = 0; j < n; j++) {
a[j][i] = a[j][i-1] + b[i][j];
}
}
}So I decided to make my code dumber and dumber until it is vectorized, but even this function (which at this point doesn't do anything useful) won't get vectorized together with outer loop:
void calculate_doppler_factor_test(
int satellites_start, int satellites_end,
int dates_start, int dates_end, int dates_count,
double *__restrict doppler_factors)
{
#pragma clang loop vectorize(enable) vectorize_width(2)
for (int i = satellites_start; i < satellites_end; i++)
{
for (int j = dates_start; j < dates_end; j++)
{
int doppler_factor_index = (i * dates_count + j);
doppler_factors[doppler_factor_index] = 1.0;
}
}
}Just like with my original code, as soon as I remove #pragma, it vectorizes the inner loop though.
I would imagine that if inner loop vectorizes, outer loop should too, if it only contains the inner loop and nothing else. Obviously that's not the case. So I wonder, compared to the standard LLVM vectorization, what are additional constraints on loops for outer loop vectorization?
I tested this on Emscripten 4.0.16.