Skip to content

cudax/stf: cuda_try migration — stream_ctx (PR8)#9306

Open
andralex wants to merge 2 commits into
NVIDIA:mainfrom
andralex:andralex/stf-cuda-try-stream-ctx
Open

cudax/stf: cuda_try migration — stream_ctx (PR8)#9306
andralex wants to merge 2 commits into
NVIDIA:mainfrom
andralex:andralex/stf-cuda-try-stream-ctx

Conversation

@andralex

@andralex andralex commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Migrate stream/stream_ctx.cuh from cuda_safe_call to cuda_try (16 sites).
  • Submit reorder timing uses cudaEventCreateWithFlags and SCOPE(fail) to destroy timing events if task.run() or end-timing throws.
  • UNITTEST cudaHostRegister paths use cuda_try<>; scalar fixture uses SCOPE(fail) to free malloc on register failure.
  • Part of the staged STF cuda_safe_callcuda_try migration (PR8 of 11).

Test plan

  • Local build: ninja -j 12 -C build/cudax-cpp20 cudax.test.stf.unittest_headers.__stf.stream.stream_ctx
  • CI

Convert stream synchronization, submit timing, wait(), and UNITTEST host
register paths to cuda_try; roll back reorder timing events on throw via
SCOPE(fail) and use cudaEventCreateWithFlags for overload-set safety.
@andralex andralex requested a review from a team as a code owner June 9, 2026 01:26
@andralex andralex requested a review from srinivasyadav18 June 9, 2026 01:26
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Jun 9, 2026
@copy-pr-bot

copy-pr-bot Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 9, 2026
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note: CodeRabbit is enabled on this repository as a convenience for maintainers
and contributors. Use your best judgment when considering its review comments and
suggestions — a suggested change may be inadequate, unnecessary, or safe to ignore.
Contributors are not expected to address every comment. Human reviews are what
ultimately matter for merging.

Summary

This PR continues the staged migration from cuda_safe_call to cuda_try error handling, updating the stream context implementation. The changes affect 16 call sites across stream_ctx.cuh and associated unit tests, converting blocking CUDA operations and event management to use the cuda_try<...> pattern.

Changes

Core stream context operations:

  • Replaced cudaStreamSynchronize blocking calls with cuda_try<cudaStreamSynchronize> in uncached_stream_allocator::deallocate and stream_ctx::finalize
  • Updated stream_ctx::submit reordering/timing event path to use cudaEventCreateWithFlags (via cuda_try) instead of manual cudaEventCreate
  • Added SCOPE(fail) cleanup blocks guarded by timing_events_owned boolean to safely destroy partially-created timing events on failure
  • Migrated subsequent stop/elapsed-time operations to cuda_try equivalents
  • Updated host-side synchronization task to use cuda_try<cudaStreamSynchronize>

Unit test updates:

  • Replaced cudaHostRegister and cudaHostUnregister calls with cuda_try variants
  • Added failure-safe cleanup scope around registered host allocation in untyped logical data moveability test

Metrics

  • Lines changed: +40/-18
  • Estimated code review effort: Medium
  • Part 8 of 11-PR staged migration
  • Local build verified: ✓ (target cudax.test.stf.unittest_headers.__stf.stream.stream_ctx)

Walkthrough

Single file updating stream context error handling from cuda_safe_call to cuda_try across stream synchronization, timing-event creation with fail-safe cleanup, and unit tests. No public API changes. Improves error recovery semantics for partially-initialized events and host memory registration.

Changes

Stream Context Error Handling Refactoring

Layer / File(s) Summary
Stream synchronization with cuda_try
cudax/include/cuda/experimental/__stf/stream/stream_ctx.cuh
uncached_stream_allocator::deallocate, stream_ctx::finalize, and stream_ctx::wait replace cuda_safe_call(cudaStreamSynchronize) with cuda_try<cudaStreamSynchronize>.
Timing event lifecycle with fail-safe cleanup
cudax/include/cuda/experimental/__stf/stream/stream_ctx.cuh
stream_ctx::submit reordering path adds timing_events_owned boolean and SCOPE(fail) cleanup block to conditionally destroy start/stop events on failure; event creation/record/completion steps use cuda_try with cudaEventCreateWithFlags.
Unit test updates for host memory operations
cudax/include/cuda/experimental/__stf/stream/stream_ctx.cuh
Logical data and slice tests replace cuda_safe_call(cudaHostRegister/Unregister) with cuda_try; logical_data_untyped test adds SCOPE(fail) guard for host allocation cleanup.

Possibly related PRs

  • NVIDIA/cccl#9249: Also migrates STF timing/event path to cuda_try<...> + cudaEventCreateWithFlags pattern in stream_ctx host-launch flows.
  • NVIDIA/cccl#9265: Migrates timing/synchronization in cuda_kernel_scope's event record/sync/elapsed-time used by stream_ctx from cuda_safe_call to cuda_try.

Suggested labels

stf

Suggested reviewers

  • srinivasyadav18
  • caugonnet

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5c509e00-6061-424e-9f00-203ceaf361b2

📥 Commits

Reviewing files that changed from the base of the PR and between 5ea5d42 and 837b649.

📒 Files selected for processing (1)
  • cudax/include/cuda/experimental/__stf/stream/stream_ctx.cuh

Comment on lines +565 to +567
startEvent = cuda_try<cudaEventCreateWithFlags>(cudaEventDefault);
stopEvent = cuda_try<cudaEventCreateWithFlags>(cudaEventDefault);
timing_events_owned = true;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

critical: Event leak on partial creation failure.

If cudaEventCreateWithFlags succeeds at line 565 but fails at line 566, startEvent is non-null but timing_events_owned is still false. The SCOPE(fail) guard at line 530 checks if (timing_events_owned) and skips cleanup, leaking startEvent.

Move line 567 immediately after line 565:

 startEvent          = cuda_try<cudaEventCreateWithFlags>(cudaEventDefault);
+timing_events_owned = true;
 stopEvent           = cuda_try<cudaEventCreateWithFlags>(cudaEventDefault);
-timing_events_owned = true;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
startEvent = cuda_try<cudaEventCreateWithFlags>(cudaEventDefault);
stopEvent = cuda_try<cudaEventCreateWithFlags>(cudaEventDefault);
timing_events_owned = true;
startEvent = cuda_try<cudaEventCreateWithFlags>(cudaEventDefault);
timing_events_owned = true;
stopEvent = cuda_try<cudaEventCreateWithFlags>(cudaEventDefault);

Comment on lines +579 to +583
cuda_try<cudaSetDevice>(0);
cuda_try<cudaEventRecord>(stopEvent, fence());
cuda_try<cudaEventSynchronize>(stopEvent);
state.submission_time = cuda_try<cudaEventElapsedTime>(startEvent, stopEvent);
timing_events_owned = false;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

critical: Event leak in success path.

After computing elapsed time at line 582, neither startEvent nor stopEvent is destroyed. Line 583 disables the SCOPE(fail) cleanup by setting timing_events_owned = false, so the events are never freed. cudaEvent_t requires explicit cudaEventDestroy.

Add destruction before disabling ownership:

 state.submission_time = cuda_try<cudaEventElapsedTime>(startEvent, stopEvent);
+cuda_safe_call(cudaEventDestroy(startEvent));
+cuda_safe_call(cudaEventDestroy(stopEvent));
 timing_events_owned   = false;

Use cuda_safe_call here (not cuda_try) because the SCOPE(fail) is still active and we cannot let destruction throw into the failure handler.

@andralex

andralex commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 837b649

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

😬 CI Workflow Results

🟥 Finished in 34m 53s: Pass: 21%/55 | Total: 5h 44m | Max: 34m 50s | Hits: 99%/2329

See results here.

@andralex

andralex commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 85303ba

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

1 participant