Skip to content

Nested loop vectorization #26322

@1valdis

Description

@1valdis

I have a nested loop that computes data in two dimensions, like this:

for (int i = satellites_start; i < satellites_end; i++)
  {
    for (int j = dates_start; j < dates_end; j++)
    {
      // calculations go here
    }
  }

I compile with -O3 -msimd128 -Rpass=loop-vectorize -Rpass-missed=loop-vectorize -Rpass-analysis=loop-vectorize options. The inner loop is automatically vectorized, but not the outer loop (as expected by default).

I have found that in LLVM it is possible to enable nested loop vectorization by using "VPlan native path" instead of the default "standard path" (if I understand correctly) and I want to try. Here's what I did:

  1. I added -mllvm -enable-vplan-native-path to options. Emscripten doesn't error so I assume the option is passed to LLVM.
  2. I added #pragma clang loop vectorize(enable) vectorize_width(2) above the outer loop (otherwise it still only vectorizes the inner loop).

This is what output I get:

src-cpp/common.cpp:653:3: remark: loop not vectorized: loop control flow is not understood by vectorizer [-Rpass-analysis]
  653 |   for (int i = satellites_start; i < satellites_end; i++)
      |   ^
src-cpp/common.cpp:653:3: remark: loop not vectorized: Unsupported outer loop [-Rpass-analysis]
src-cpp/common.cpp:653:3: remark: loop not vectorized (Force=true, Vector Width=2) [-Rpass-missed=loop-vectorize]
src-cpp/common.cpp:653:3: warning: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled  
      or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]

However I tried this code in someone's example and it was successfully vectorized:

void example(int n, int a[1024][1024], int b[1024][1024])
{
  #pragma clang loop vectorize(enable) vectorize_width(4)
  for (int i = 1; i < n; i++) {
    for (int j = 0; j < n; j++) {
        a[j][i] = a[j][i-1] + b[i][j];
    }
  }
}

So I decided to make my code dumber and dumber until it is vectorized, but even this function (which at this point doesn't do anything useful) won't get vectorized together with outer loop:

void calculate_doppler_factor_test(
    int satellites_start, int satellites_end,
    int dates_start, int dates_end, int dates_count,
    double *__restrict doppler_factors)
{
  #pragma clang loop vectorize(enable) vectorize_width(2)
  for (int i = satellites_start; i < satellites_end; i++)
  {
    for (int j = dates_start; j < dates_end; j++)
    {
      int doppler_factor_index = (i * dates_count + j);
      doppler_factors[doppler_factor_index] = 1.0;
    }
  }
}

Just like with my original code, as soon as I remove #pragma, it vectorizes the inner loop though.

I would imagine that if inner loop vectorizes, outer loop should too, if it only contains the inner loop and nothing else. Obviously that's not the case. So I wonder, compared to the standard LLVM vectorization, what are additional constraints on loops for outer loop vectorization?

I tested this on Emscripten 4.0.16.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions