Fix SendRequestAsync hang on Streamable HTTP SSE retry#1396
Fix SendRequestAsync hang on Streamable HTTP SSE retry#1396
Conversation
…e before foreground transport send completes Replace `await SendToRelatedTransportAsync(request, cancellationToken)` with `Task.WhenAny(sendTask, tcs.Task)` so that if the response TCS completes first (via a concurrent background channel such as the background GET SSE stream in Streamable HTTP), execution proceeds immediately without waiting for the foreground send. If the send completes first (normal case), it is awaited directly, preserving the original behavior. The still-running send task's exceptions are observed via a fire-and-forget ContinueWith(OnlyOnFaulted) to prevent unobserved task exceptions. Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
…nt channel Use a linked CancellationTokenSource so the foreground transport send (e.g., retry GET loop) is cancelled when the response TCS completes first via a concurrent background stream. Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Address review feedback: - Invert the WhenAny check so the common case (send completes first) is in the if branch. - Replace fire-and-forget ContinueWith with ObserveSendFaults that logs the exception at Debug level using the logging source generator. Uses ConfigureAwaitOptions.SuppressThrowing on .NET and ContinueWith on downlevel. Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
|
@copilot please rewrite the PR description and title to reflect the core problem, the proposed solution, and why it fixes things |
Updated the PR description to explain the core problem (foreground GET retry loop hangs when background SSE stream consumes the response first), the solution (race send against TCS with |
…en response already arrived Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
… at call site Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
|
@halter73, I think (hope) this will fix the long-standing sse-retry test timeout failures that have plagued us. |
|
@copilot there are test failures |
The test failures are pre-existing and unrelated to this PR. The failing tests ( Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
…vent cancellation race The McpSessionHandler.SendRequestAsync races the transport send against the response TCS using Task.WhenAny. When the TCS completes during SSE stream processing, the send task is cancelled. But SendHttpRequestAsync previously set SessionId, _negotiatedProtocolVersion, and started _getReceiveTask AFTER SSE processing (lines 136-148), so these critical side effects were lost. Fix: extract SessionId from HTTP response headers and start _getReceiveTask immediately after receiving a successful HTTP response for initialize requests, before SSE content processing begins. The protocol version (from the response body) is still set after SSE processing when possible. Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com> Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
…initialize requests The previous approach cancelled the transport send task via linked CTS when the response TCS completed first. This aborted SSE processing before SessionId and _getReceiveTask were set, causing subsequent requests to fail with 400 (Mcp-Session-Id header is required). Fix: - Remove the linked CTS that cancelled the send task - For initialize requests, always await the send to completion so the transport's side effects (SessionId, protocol version, background tasks) complete before subsequent messages - For non-initialize requests, use WhenAny to avoid the SSE retry hang where the foreground send blocks forever because the response arrived via a background channel - Revert StreamableHttpClientSessionTransport.cs to main's version since the send is no longer cancelled for initialize requests Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com> Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
| // For most requests, wait for either the transport send to complete or for the | ||
| // response to arrive via a concurrent channel (e.g. the background GET SSE stream | ||
| // in Streamable HTTP). Without this, the foreground transport send could block | ||
| // indefinitely waiting for a response that was already delivered via a different stream. |
There was a problem hiding this comment.
Responses should not be delivered to different streams in practice ignoring Tasks and SSE event stream resumption logic. The protocol spec opens the door for it but definitely discourages it, and our server should never send the response to another stream while leaving the HTTP response for the originating request open. If the conformance server is doing that, that seems like a major bug in the conformance MCP server.
There was a problem hiding this comment.
If the conformance server is doing that, that seems like a major bug in the conformance MCP server.
@halter73, isn't that what the sse-retry test is written to do and test?
The client sends a tools/call via POST.
The server closes the POST SSE stream without sending the response.
The client reconnects via GET (a completely different HTTP request/stream).
The server delivers the tools/call response on the GET stream, a different stream than the one that originated the request.
There was a problem hiding this comment.
The server closes the POST SSE stream without sending the response.
But it still should complete the HTTP response unblocking StreamableHttpClientSessionTransport.SendMessageAsync. It's best to await it so client users know that as long as all their McpClient requests are complete so are all their HTTP requests, and if not, they can still cancel the underlying HTTP request. It's also nice to let unrecoverable HTTP errors bubble up as an observable exception instead of being forced to look at logs.
| { | ||
| // The response arrived via a concurrent channel before the transport send completed. | ||
| // Let the send finish naturally but observe any faults at debug level. | ||
| _ = sendTask.ContinueWith( |
There was a problem hiding this comment.
I'm not a fan of this at all. This seems way too cute. We shouldn't complicate the client logic because some servers might be leaving response streams open. I doubt that's what's happening in the conformance test anyway as I point out in #1368 (comment)
🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.