fix: retry MCP callTool on any connection error, not just ErrSessionMissing by dgageot · Pull Request #2215 · docker/docker-agent

dgageot · 2026-03-22T18:11:21Z

When an MCP server restarts, callTool could receive transport errors (EOF, connection reset) instead of ErrSessionMissing depending on timing. Broaden the retry condition to trigger forceReconnectAndWait on any non-context-cancellation error, fixing a flaky TestRemoteReconnectAfterServerRestart.

Assisted-By: docker-agent

docker-agent

Assessment: 🔴 CRITICAL

Found 2 issues in the retry logic changes:

1 HIGH severity issue
1 MEDIUM severity issue

docker-agent · 2026-03-22T18:15:38Z

pkg/tools/mcp/mcp.go

+	// If the call failed with a connection or session error (e.g. the
+	// server restarted), trigger or wait for a reconnection and retry
+	// the call once. Context-cancellation errors are not retried.
+	if err != nil && !errors.Is(err, context.Canceled) && ctx.Err() == nil {


🔴 HIGH: Retry logic now retries on ALL errors, not just connection/session errors

The retry condition was broadened from checking specifically for errors.Is(err, mcp.ErrSessionMissing) to checking for any error that is not context.Canceled. This means the code will now trigger reconnection for ALL errors from CallTool, not just session-related errors.

Impact: Application-level errors like authentication failures, invalid tool names, or validation errors will trigger expensive reconnect attempts that won't fix the underlying issue. This wastes resources and adds latency to error responses.

Recommendation: Consider checking for specific connection/transport error types (e.g., net.Error, io.EOF, syscall.ECONNRESET) instead of retrying on all errors. For example:

if err != nil && isConnectionError(err) && ctx.Err() == nil { // retry logic } func isConnectionError(err error) bool { if errors.Is(err, mcp.ErrSessionMissing) { return true } if errors.Is(err, io.EOF) { return true } var netErr net.Error if errors.As(err, &netErr) { return true } // Check for syscall errors like ECONNRESET return false }

docker-agent · 2026-03-22T18:15:38Z

pkg/tools/mcp/mcp.go

-	// reconnection and retry the call once.
-	if errors.Is(err, mcp.ErrSessionMissing) {
-		slog.Warn("MCP session missing, forcing reconnect and retrying", "tool", toolCall.Function.Name, "server", ts.logID)
+	// If the call failed with a connection or session error (e.g. the


🟡 MEDIUM: Comment claims to retry only connection/session errors but code retries all non-canceled errors

The comment states "If the call failed with a connection or session error (e.g. the server restarted), trigger or wait for a reconnection" but the actual code checks if err != nil && !errors.Is(err, context.Canceled) && ctx.Err() == nil, which retries on ANY error.

Impact: This mismatch between comment and implementation can mislead future maintainers about the actual behavior of the code.

Recommendation: Update the comment to accurately reflect the implementation, or update the implementation to match the comment's intent. For example:

// If the call failed with any error (except context cancellation), // trigger or wait for a reconnection and retry the call once.

Or implement selective error checking as suggested in the HIGH severity finding above.

…sionMissing When an MCP server restarts, callTool may receive transport errors (EOF, connection reset, broken pipe) instead of ErrSessionMissing depending on timing. Add an isConnectionError helper that selectively retries on known connection/session error types rather than retrying on all errors indiscriminately. The helper checks for: - mcp.ErrSessionMissing (server lost our session) - io.EOF (connection dropped) - net.Error (timeout, connection reset, etc.) - String-based fallback for transport errors the MCP SDK wraps with %v (losing the original error chain) This fixes flaky TestRemoteReconnectAfterServerRestart without broadening retry to application-level errors. Assisted-By: docker-agent

dgageot requested a review from a team as a code owner March 22, 2026 18:11

docker-agent bot reviewed Mar 22, 2026

View reviewed changes

dgageot marked this pull request as draft March 22, 2026 18:52

dgageot force-pushed the fix-flaky-test branch from f467661 to 847c591 Compare March 23, 2026 08:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: retry MCP callTool on any connection error, not just ErrSessionMissing#2215

fix: retry MCP callTool on any connection error, not just ErrSessionMissing#2215
dgageot wants to merge 1 commit intodocker:mainfrom
dgageot:fix-flaky-test

dgageot commented Mar 22, 2026

Uh oh!

docker-agent bot left a comment

Uh oh!

docker-agent bot Mar 22, 2026

Uh oh!

docker-agent bot Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dgageot commented Mar 22, 2026

Uh oh!

docker-agent bot left a comment

Choose a reason for hiding this comment

Assessment: 🔴 CRITICAL

Uh oh!

docker-agent bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant