Skip to content

fix: collapse BaseExceptionGroup to surface real errors from task groups#2179

Open
giulio-leone wants to merge 10 commits intomodelcontextprotocol:mainfrom
giulio-leone:fix/collapse-exception-group-2114
Open

fix: collapse BaseExceptionGroup to surface real errors from task groups#2179
giulio-leone wants to merge 10 commits intomodelcontextprotocol:mainfrom
giulio-leone:fix/collapse-exception-group-2114

Conversation

@giulio-leone
Copy link

Summary

Fixes #2114

When a task in an anyio task group fails, sibling tasks are cancelled. The resulting BaseExceptionGroup contains the real error alongside Cancelled exceptions from those siblings. This makes error classification extremely difficult for callers — they cannot reliably determine the root cause of a failure.

Root Cause

There are 16 create_task_group() usages across the SDK with no except* syntax or ExceptionGroup unwrapping anywhere. A single connection failure produces a BaseExceptionGroup containing the real error plus multiple Cancelled exceptions.

Solution

  1. New utility module (src/mcp/shared/_exception_utils.py):

    • collapse_exception_group(eg, cancelled_type) — extracts the single real error from a group if there's exactly one non-cancelled exception; preserves the full group for multiple concurrent failures
    • open_task_group() — drop-in replacement for anyio.create_task_group() that automatically collapses on exit
  2. Applied to all 16 task group sites:

    • Client transports: SSE, stdio, WebSocket, StreamableHTTP
    • Server transports: SSE, stdio, WebSocket, StreamableHTTP (2 sites)
    • BaseSession.__aexit__
    • Server lowlevel run loop
    • StreamableHTTP session manager
    • SessionGroup, InMemoryTransport
    • Experimental task support (TaskResultHandler, TaskSupport)

Behavior

Scenario Before After
1 real error + N Cancelled BaseExceptionGroup([ConnectionError, Cancelled, Cancelled]) ConnectionError (with group as __cause__)
All Cancelled BaseExceptionGroup([Cancelled, Cancelled]) Single Cancelled
Multiple real errors BaseExceptionGroup([ValueError, RuntimeError, Cancelled]) BaseExceptionGroup([ValueError, RuntimeError]) (Cancelled stripped)
No errors No exception No exception

The original BaseExceptionGroup is always preserved as __cause__ for debugging.

Testing

  • 9 new tests (5 unit tests for collapse_exception_group, 4 integration tests for open_task_group)
  • All 666 existing tests pass (2 consecutive clean runs, 0 errors, 0 warnings)

g97iulio1609 and others added 10 commits February 28, 2026 15:41
When a task in an anyio task group fails, sibling tasks are cancelled.
The resulting BaseExceptionGroup contains the real error alongside
Cancelled exceptions from those siblings. This makes error classification
extremely difficult for callers.

Add open_task_group() context manager and collapse_exception_group()
utility that detect this pattern and re-raise just the original error,
keeping the full group as __cause__ for debugging.

Applied to all 16 create_task_group() sites across:
- Client transports (sse, stdio, websocket, streamable_http)
- Server transports (sse, stdio, websocket, streamable_http)
- Session __aexit__
- Server lowlevel run loop
- StreamableHTTP session manager
- SessionGroup, InMemoryTransport
- Experimental task support

Fixes modelcontextprotocol#2114
On Python < 3.11, BaseExceptionGroup is not a builtin and must be
imported from the exceptiongroup backport package (transitive dep
via anyio).
- Remove unused `import anyio` from 4 modules where anyio.create_task_group
  was replaced by open_task_group
- Add `# pragma: no cover` to sys.version_info < (3, 11) checks since
  coverage is per-Python-version and each version only covers one branch
- Add `# pragma: lax no cover` to defensive raise paths in open_task_group
  and BaseSession.__aexit__ (triggered when exception group has no
  cancellation noise — extremely rare with anyio task groups)
strict-no-cover flags 'pragma: no cover' as incorrect when the lines
ARE covered on the running Python version. Use 'pragma: lax no cover'
instead, which is excluded from both coverage counting and strict
checking.
Pre-commit end-of-file fixer requires a blank line between the except
block and the next method definition.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add _get_exceptions() helper to provide typed access to
BaseExceptionGroup.exceptions, avoiding reportUnknownMemberType errors.
Use pyright: ignore[reportUnknownArgumentType] for the narrowed
BaseExceptionGroup[Unknown] type after isinstance checks.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolves pyright reportUnusedVariable error.
The split() return value for the cancelled subgroup is intentionally
discarded.  Use bare _ instead of _cancelled so pyright strict mode
recognises it as an unused-by-design binding.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@giulio-leone
Copy link
Author

All 25 CI checks pass (26th is claude-review skip). Previously flaky Windows test_stdio is now stable. Ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ExceptionGroup wrapping obscures real errors from task groups

1 participant