cudax/stf: migrate stream/interfaces/ from cuda_safe_call to cuda_try#9268
cudax/stf: migrate stream/interfaces/ from cuda_safe_call to cuda_try#9268andralex wants to merge 7 commits into
Conversation
Covers the stream-backend data interfaces (hashtable, slice, slice reduction ops). Uses the templated cuda_try<F> form for single-function calls (cudaMemcpyAsync, cudaMemcpy2DAsync, cudaMemsetAsync, cudaStreamSynchronize, cudaFreeHost, cudaFreeAsync, cudaPointerGetAttributes). cudaHostAlloc and cudaMallocAsync stay in the runtime-status form: both are overload sets (cuda_runtime.h templated wrappers), so cuda_try<F> cannot name them. hashtable stream_data_allocate: after the device cudaMallocAsync succeeds, the buffer is freed via SCOPE(fail) if the subsequent cudaMemsetAsync throws, so the new throw path does not leak the allocation. Adds scope_guard.cuh. Verified locally by building cudax.test.stf.hashtable.test, cudax.test.stf.reductions.slice2d_reduction (2D cudaMemcpy2DAsync path), and cudax.test.stf.reductions.reduce_sum.
|
placeholder |
|
/ok to test b6dd048 |
|
/ok to test 33ffac5 |
@andralex, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/ |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
SummaryThis PR migrates CUDA runtime error handling in the data interfaces under cudax/include/cuda/experimental/__stf/stream/interfaces/ from abort-on-error cuda_safe_call(...) to exception-based cuda_try(...). Three interface files are changed (hashtable_linearprobing.cuh, slice.cuh, slice_reduction_ops.cuh). Large stream files (event_types.cuh, stream_ctx.cuh, stream_task.cuh) are intentionally excluded for separate treatment. Files modified: 3 | Sites changed: 15 | Lines changed: +33/-21 Changes
Notable implementation details / decisions
API / compatibility
Validation / tests
Comments / commits
Walkthroughimportant: Three STF stream-interface headers replace ChangesSTF Stream Interface Error Handling Refactor
Possibly related PRs
Suggested reviewers
Suggested labelsstf Comment |
…set) cudaMemcpyAsync is an overload set on CTK 13.2+ (cuda_runtime.h adds an alternate-spelling wrapper), so cuda_try<cudaMemcpyAsync> is ill-formed. Clang rejects it; GCC accepts it leniently, which is why local GCC builds missed it. Revert the cudaMemcpyAsync calls in the stream slice and hashtable interfaces to the runtime-status cuda_try(cudaMemcpyAsync(...)) form. The other (non-overloaded) calls keep the templated form.
|
/ok to test 3ae7ba1 |
This comment has been minimized.
This comment has been minimized.
|
/ok to test 3ae7ba1 |
|
/ok to test 4e868a7 |
|
/ok to test a6cfd45 |
This comment has been minimized.
This comment has been minimized.
|
/ok to test 0231898 |
Disambiguate overload-set alloc/copy calls with explicit template arguments where applicable, defer cudaStreamSynchronize errors until after cudaFreeHost in hashtable host deallocate, and simplify slice::get_memory_type.
|
/ok to test 8dce860 |
😬 CI Workflow Results🟥 Finished in 1h 01m: Pass: 41%/55 | Total: 11h 10m | Max: 58m 43s | Hits: 20%/39040See results here. |
|
/ok to test 1c49aa4 |
Summary
Migrates the
cudax/include/cuda/experimental/__stf/stream/interfaces/data interfaces (hashtable, slice, slice reduction ops) fromcuda_safe_calltocuda_try. Part of the ongoing STFcuda_safe_call->cuda_tryrollout; the large stream files (event_types.cuh,stream_ctx.cuh,stream_task.cuh) are handled in separate PRs.Changes (3 files, 15 sites)
cuda_try<F>for single-function calls:cudaMemcpyAsync,cudaMemcpy2DAsync,cudaMemsetAsync,cudaStreamSynchronize,cudaFreeHost,cudaFreeAsync,cudaPointerGetAttributes(out-param -> returnedcudaPointerAttributes).cuda_try(...)for overload sets (cuda_runtime.htemplated wrappers):cudaHostAlloc,cudaMallocAsync.hashtable_linearprobing.cuhleak guard: instream_data_allocate, after the devicecudaMallocAsyncsucceeds, the buffer is freed viaSCOPE(fail)if the followingcudaMemsetAsyncthrows — closing the leak the new throw path would otherwise introduce. Addsscope_guard.cuh.Validation
Built locally (cpp20):
cudax.test.stf.hashtable.test,cudax.test.stf.reductions.slice2d_reduction(exercises the 2DcudaMemcpy2DAsyncpath),cudax.test.stf.reductions.reduce_sum— all compile and link.Test plan