You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
test_moe.py has a segmented fusion with t0[b0, i1, i2] and t1[i1, b2], it uses transpose scheduler due to t1's inner most dim is [i1] which is different from t0's inner most dim [i2].
Ideally, tranpose scheduler should reject it and pointwise scheudler should be used.
This PR revised hasAtLeastTwoValidGroups by filtering out bcast domains and check if the shorter one is a sub-sequence of the longer one. e.g. filtered t0 is [i1,i2], filtered t1 is [i1]. t1 is a sub-sequence of t0 and they are considered as mapped and tranpose scheduler will reject it. This check ensures there is no actual permutation of concretized domains and pointwise should be used instead of transpose.
The new subsequence-based mapping algorithm is more permissive than the previous exact equality check. While this allows broadcast dimensions to be skipped as intended, verify that this doesn't incorrectly accept invalid mappings. The algorithm assumes that if all elements of the shorter sequence can be found in the longer sequence in order, then the domains are compatible. This seems correct for the broadcast use case but should be validated.
// Filter out reduction and broadcast dimensions before comparing.// - the shorter must be a subsequence of the longer (same// order), e.g. longer [i0,i1,i2] allows shorter [i0,i1], [i1,i2], [i0,i2];// disallows [i1,i0], [i2,i1], [i2,i0].auto ref1_filtered =
ref1_loop | TensorDomain::kNoReductions | TensorDomain::kNoBroadcasts;
auto ref2_filtered =
ref2_loop | TensorDomain::kNoReductions | TensorDomain::kNoBroadcasts;
constauto n1 = std::ranges::distance(ref1_filtered);
constauto n2 = std::ranges::distance(ref2_filtered);
auto shorter = (n1 <= n2) ? ref1_filtered : ref2_filtered;
auto longer = (n1 <= n2) ? ref2_filtered : ref1_filtered;
auto it = std::ranges::begin(longer);
auto end = std::ranges::end(longer);
bool all_mapped = true;
for (IterDomain* id_s : shorter) {
it = std::ranges::find_if(it, end, [&](IterDomain* id_l) {
return ca_map.areMapped(id_s, id_l, IdMappingMode::PERMISSIVE);
});
if (it == end) {
all_mapped = false;
break;
}
++it;
}
The new algorithm uses nested search (std::ranges::find_if in a loop) which could be O(n*m) complexity vs the previous O(n) exact match. Given that tensor domains are typically small, this may not be significant, but the performance impact should be evaluated, especially for large tensor domains.
for (IterDomain* id_s : shorter) {
it = std::ranges::find_if(it, end, [&](IterDomain* id_l) {
return ca_map.areMapped(id_s, id_l, IdMappingMode::PERMISSIVE);
});
if (it == end) {
all_mapped = false;
break;
}
++it;
The validation check was corrected to examine the original loop domains for broadcasts rather than the filtered domains. This is the correct fix since we want to ensure broadcasts exist in the original domains. Verify this change doesn't break any existing validation assumptions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
test_moe.pyhas a segmented fusion witht0[b0, i1, i2]andt1[i1, b2], it uses transpose scheduler due to t1's inner most dim is [i1] which is different from t0's inner most dim [i2].Ideally, tranpose scheduler should reject it and pointwise scheudler should be used.
This PR revised
hasAtLeastTwoValidGroupsby filtering out bcast domains and check if the shorter one is a sub-sequence of the longer one. e.g. filtered t0 is [i1,i2], filtered t1 is [i1]. t1 is a sub-sequence of t0 and they are considered as mapped and tranpose scheduler will reject it. This check ensures there is no actual permutation of concretized domains and pointwise should be used instead of transpose.