run to run scan warpspeed impl sm100+#9263
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (4)
OverviewThis PR implements run-to-run (deterministic) support for the warpspeed scan optimization on SM100+ targets by adding a stable reduction-order path to the warpspeed lookahead logic and plumbing that choice through the scan dispatch and tuning logic. The change threads a compile-time StableReductionOrder boolean through the warpspeed closure, kernel body, and dispatch so the warpspeed path can use a deterministic lookahead variant when required. Confirmed changes (key points)
Repository inspection notes
Tests / Docs / Checklist (remaining work)The PR does not include tests, benchmarks, or documentation updates required by issue
Related issueCloses important: WalkthroughAdds a deterministic warpIncrementalLookaheadStable, threads a compile-time StableReductionOrder flag through the warpspeed closure and dispatch, and relaxes policy gating to permit warpspeed on sm_100+ when stable reduction order is requested. ChangesStable Warpspeed Scan Implementation
Assessment against linked issues
Suggested reviewers
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cub/cub/device/dispatch/tuning/tuning_scan.cuh (1)
1038-1047:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winsuggestion: Update the inline rationale for the
require_stable_reduction_order→cc >= {10, 0}gate:warpIncrementalLookaheadStableis available for__cccl_ptx_isa >= 860(sm_90+), but the scan policy selector only produces ascan_warpspeed_policywhencc >= {10, 0}(otherwiseget_warpspeed_policyreturns{}), so stable warpspeed on sm_90+ is blocked by warpspeed policy/tuning availability—not by stable lookahead codegen availability.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: dfcdb20c-106f-4ae5-a688-9e19e5475411
📒 Files selected for processing (4)
cub/cub/detail/warpspeed/look_ahead.cuhcub/cub/device/dispatch/kernels/kernel_scan.cuhcub/cub/device/dispatch/kernels/kernel_scan_warpspeed.cuhcub/cub/device/dispatch/tuning/tuning_scan.cuh
|
/ok to test cbd13bb |
This comment has been minimized.
This comment has been minimized.
|
pre-commit.ci autofix |
|
/ok to test 5b9637b |
🥳 CI Workflow Results🟩 Finished in 2h 21m: Pass: 100%/284 | Total: 11d 16h | Max: 2h 20m | Hits: 19%/969497See results here. |
5b9637b to
14e5c19
Compare
|
/ok to test 14e5c19 |
Description
closes #7556
Checklist