Skip to content

Use clean at::cuda::CUDAStream but not wrapped with std::optional#11

Open
youge325 wants to merge 5 commits intoPFCCLab:paddlefrom
youge325:cRemove_Optional
Open

Use clean at::cuda::CUDAStream but not wrapped with std::optional#11
youge325 wants to merge 5 commits intoPFCCLab:paddlefrom
youge325:cRemove_Optional

Conversation

@youge325
Copy link

@youge325 youge325 commented Mar 24, 2026

之前因为 at::cuda::CUDAStream 没有 operator= 重载导致无法直接赋值,因此使用了 std::optional 容器,实际上应该像原来一样使用初始化列表,通过 lambda 表达式返回一个用于构造函数重载的 Stream ,以此消除引入 std::optional 带来的 diff

Copilot AI review requested due to automatic review settings March 24, 2026 14:18
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to remove the std::optional wrapper around at::cuda::CUDAStream by constructing comm_stream directly (via initializer-list + lambda), and updating stream usages accordingly.

Changes:

  • Replace std::optional<at::cuda::CUDAStream> comm_stream with a plain at::cuda::CUDAStream.
  • Initialize device_id and comm_stream in Buffer’s constructor initializer list.
  • Update call sites from comm_stream.value() to comm_stream.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 15 comments.

File Description
csrc/deep_ep.hpp Changes Buffer to store comm_stream directly and adjusts accessors.
csrc/deep_ep.cpp Moves stream/context setup into initializer list and updates many stream call sites.
Comments suppressed due to low confidence (2)

csrc/deep_ep.cpp:1447

  • This internode::cached_notify call passes comm_stream (at::cuda::CUDAStream) where the API expects cudaStream_t (see csrc/kernels/api.cuh). Please pass comm_stream.stream() / get_comm_stream_raw() instead.
                             rank,
                             comm_stream,
                             config.get_rdma_buffer_size_hint(hidden_int4 * sizeof(int4), num_ranks),
                             num_nvl_bytes,
                             false,
                             low_latency_mode);

csrc/deep_ep.cpp:1127

  • internode::cached_notify’s signature takes cudaStream_t stream (see csrc/kernels/api.cuh), but this call passes comm_stream (at::cuda::CUDAStream). This should be comm_stream.stream() / get_comm_stream_raw().
                                 barrier_signal_ptrs_gpu,
                                 rank,
                                 comm_stream,
                                 config.get_rdma_buffer_size_hint(hidden_int4 * sizeof(int4), num_ranks),
                                 num_nvl_bytes,
                                 true,
                                 low_latency_mode);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@youge325 youge325 changed the title use clean at::cuda::CUDAStream but not wrapped with std::optional Use clean at::cuda::CUDAStream but not wrapped with std::optional Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants