Optimize non fixed size segmented reduce for small segments using max_segment_size#7718
Merged
srinivasyadav18 merged 18 commits intoNVIDIA:mainfrom Mar 10, 2026
Merged
Conversation
This comment has been minimized.
This comment has been minimized.
NaderAlAwar
reviewed
Feb 19, 2026
This comment has been minimized.
This comment has been minimized.
Contributor
bernhardmgruber
left a comment
There was a problem hiding this comment.
I think this PR is massively complicated by the fact that the segmented reduction dispatch was already refactored to the new tuning API, and the fixed size segmented dispatch was not. I strongly suggest to refactor the fixed size dispatch first (#7641) and then rebase this PR.
NaderAlAwar
reviewed
Feb 25, 2026
1 task
This comment has been minimized.
This comment has been minimized.
0fe2745 to
eea5479
Compare
Contributor
There was a problem hiding this comment.
Nit: I believe int(...) is unnecessary here, since block_threads member of segmented_reduce_policy struct already is of type int.
uses cuda iterators in benchmark uses proper init for thrust vectors clean up docs in kernel disables sass check for c parallel segmented reduce test
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Contributor
Author
Performance Report:small, medium, large segments reduction code path using default max_segment_size 0NVIDIA RTX A6000 (SM 86)argmax T{ct}=F64 OffsetT=I32 - no regressions - mostly noisesum T{ct}=F64 OffsetT=I32 - no regressions - mostly noisesum T{ct}=I32 OffsetT=I32 - 3% regressions on small,medium segments which are already at < 5% SOLargmax T{ct}=I32 OffsetT=I32 - 3% regressions on small,medium segments which are already at < 5% SOL |
This comment has been minimized.
This comment has been minimized.
Contributor
|
I had a quick call with @srinivasyadav18 and here are some notes:
|
This comment has been minimized.
This comment has been minimized.
miscco
reviewed
Mar 6, 2026
Contributor
bernhardmgruber
left a comment
There was a problem hiding this comment.
Implementation looks ok to me.
bernhardmgruber
approved these changes
Mar 9, 2026
bernhardmgruber
approved these changes
Mar 9, 2026
Contributor
🥳 CI Workflow Results🟩 Finished in 2h 18m: Pass: 100%/255 | Total: 8d 19h | Max: 2h 18m | Hits: 70%/157047See results here. |
Contributor
Author
|
pre-commit.ci autofix |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
closes #6898
Checklist