Guard NetworkTransport connect continuation against stale reconnectio… #178

algal · 2026-01-19T20:16:40Z

This PR fixes a crash in NetworkTransport.connect() where a delayed reconnection task could resume a stale continuation after a new connect attempt started, causing a crash.

Motivation and Context

The bug occurs when a delayed reconnection task from a previous connect attempt runs after a new connect attempt has started.

While using iMCP.app, I saw that app crash when a client (imcp-server) started and then died quickly. Crash logs showed CheckedContinuation.resume(throwing:) from NetworkTransport.handleReconnection. What was happening was that when NetworkTransport.connect() is called, it uses a checked continuation and schedules reconnection work with backoff on failure/cancellation. If a reconnect attempt is scheduled and a new connect() call happens before that delayed task fires, the old task can still resume the previous continuation. That double‑resumes a CheckedContinuation and traps (EXC_BREAKPOINT / SIGTRAP).

How Has This Been Tested?

Trying this with the iMCP.app is the most easy way to repro the crash, and what I used to ensure that the fix removed the crash. You just run iMCP, then try running its imcp-server and CTRL-C it immmediately a few times. Eventually, this brings down the iMCP process as well.

I have not produced a test which reproduces the bug in isolation, outside of iMCP, but I can do so if that would help. The sequence would work as follows

Repro (before fix):

Create a NetworkTransport (default reconnection config).
Call connect(), then cancel the underlying connection quickly enough to enter handleReconnection.
Call connect() again before the reconnection backoff delay elapses.
When the delayed task fires, it resumes the old continuation and traps (EXC_BREAKPOINT / SIGTRAP).

I found that swift test fails on Swift 6.2.3 due to existing strict-concurrency errors in Tests/MCPTests/ClientTests.swift.

How this fix works

The continuation‑resume guard was a single boolean reset on each connect() call. Delayed reconnection tasks from a previous connect attempt could still run after a new connect had reset the guard, allowing a second resume on the old continuation.

This fix works as follows:

Introduce a per‑connect UUID (connectContinuationID).
Thread that ID through handleConnectionReady/Failed/Cancelled and handleReconnection.
Resume only if the ID matches the current connect attempt, and invalidate it after resuming.
Ignore delayed reconnection tasks from stale attempts.

Breaking Changes

None.

Besides the fix, behavior is unchanged; this only prevents stale tasks from resolving an old continuation.

This is an internal-only change; the only observable difference is that stale reconnection tasks no longer resolve a previous connect() attempt.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update

Checklist

I have read the MCP Documentation
My code follows the repository's style guidelines
New and existing tests pass locally
I have added appropriate error handling
I have added or updated documentation as needed

Additional context on use of AI

I (algal) have reviewed this PR, performed the before/after tests above, and I believe the analysis I have stated above is correct. I also have a background in Swift and of native development. However, I am not deeply familiar with this code base, and I did rely on AI in the developing this fix and drafting this PR text.

I know there's a lot of slop PRs flying around these days, so I wanted to be transparent about that. I am happy to be corrected or steered

…n tasks

DePasqualeOrg · 2026-01-20T08:04:01Z

This issue, along with many others, has already been resolved in my fork, which @movetz, @stallent, and I are discussing merging into this repository.

Instead of using a checked continuation with callback-based state handling (which requires the UUID-based guard proposed in this PR), my fork uses AsyncStream to bridge NWConnection state updates:

private func waitForConnectionReady() async throws {
    let stateStream = AsyncStream<NWConnection.State> { continuation in
        connection.stateUpdateHandler = { state in
            continuation.yield(state)
            switch state {
            case .ready, .failed, .cancelled:
                continuation.finish()
            default:
                break
            }
        }
    }

    for await state in stateStream {
        switch state {
        case .ready: return
        case .failed(let error): throw error
        case .cancelled: throw MCPError.internalError("Connection cancelled")
        // ...
        }
    }
}

This approach inherently avoids the double-resume race condition because:

A fresh stream is created for each connect() call.
The stream finishes on terminal states, leaving no lingering continuation.
My fork removes in-connect reconnection scheduling entirely — on failure, it throws immediately rather than scheduling a delayed retry. Reconnection is handled in receiveLoop() for post-connect failures.

algal · 2026-01-20T23:18:11Z

@DePasqualeOrg Cool! It sounds like your solution is better. I hope it is merged. :)

DePasqualeOrg · 2026-01-20T23:30:44Z

It won't be, unless it's replicated by someone else, because the new maintainers of this package want to take a patchwork approach to improvements. My full-featured fork will continue to be developed separately here: DePasqualeOrg/swift-mcp

Guard NetworkTransport connect continuation against stale reconnectio…

75e799d

…n tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Guard NetworkTransport connect continuation against stale reconnectio… #178

Guard NetworkTransport connect continuation against stale reconnectio… #178

algal commented Jan 19, 2026

Uh oh!

DePasqualeOrg commented Jan 20, 2026

Uh oh!

algal commented Jan 20, 2026

Uh oh!

DePasqualeOrg commented Jan 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Guard NetworkTransport connect continuation against stale reconnectio… #178

Are you sure you want to change the base?

Guard NetworkTransport connect continuation against stale reconnectio… #178

Conversation

algal commented Jan 19, 2026

Motivation and Context

How Has This Been Tested?

How this fix works

Breaking Changes

Types of changes

Checklist

Additional context on use of AI

Uh oh!

DePasqualeOrg commented Jan 20, 2026

Uh oh!

algal commented Jan 20, 2026

Uh oh!

DePasqualeOrg commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DePasqualeOrg commented Jan 20, 2026 •

edited

Loading