Summary
When a remote MCP toolset enters a persistent failure state (e.g. an SSE endpoint returning toolset not started), the TUI accumulates a new persistent warning notification on the right side of the screen on every conversation turn. After a long session the cards cover most of the viewport — each must be dismissed individually with [x].
This is distinct from #2861 / #2866: #2861 is the heap-leak root cause of the jetsam kill, this one is a UX bug that surfaces every time MCP fails repeatedly. They can both bite the same session (mine did).
Reproduction
- Start a session with at least one remote MCP whose endpoint will keep failing on
Tools() — e.g. an sse server that has accepted Initialize but later returns lifecycle.ErrNotStarted on every list, or a transport that flaps.
- Have a few normal conversation turns.
- Each turn appends a new persistent
Some toolsets failed to initialize for agent '<name>'.\n\nDetails:\n\n- mcp(remote host=… transport=sse) list failed: toolset not started [x] card.
Observed across three of my real sessions (screenshots in the linked DM): 5–7 identical stacked cards, several with two repeated lines inside the same card, after a few hours of activity.
Root cause
Two cooperating gaps:
1. No once-per-streak guard on the "list failed" path — pkg/agent/agent.go:321:
ta, err := toolSet.Tools(ctx)
if err != nil {
desc := tools.DescribeToolSet(toolSet)
slog.WarnContext(ctx, "Toolset listing failed; skipping", ...)
a.AddToolWarning(fmt.Sprintf("%s list failed: %v", desc, err))
continue
}
The "start failed" path at agent.go:361 and runtime.go:1152 correctly gates emission through StartableToolSet.ShouldReportFailure() (which returns true exactly once per failure streak). The "list failed" path skips this guard, so every iteration of collectTools re-emits a fresh warning for the same underlying problem.
2. No dedup in the notification manager — pkg/tui/components/notification/notification.go:131:
case ShowMsg:
id := nextID.Add(1)
...
item := notificationItem{ID: id, Text: msg.Text, Type: notifType}
n.items = append([]notificationItem{item}, n.items...)
ShowMsg is appended unconditionally; persistent items (TypeWarning/TypeError) never auto-expire (persistent() returns true, no tea.Tick scheduled), so identical cards stack until the user clicks each [x].
The combination produces N copies of the same warning per failing toolset per session.
Proposed fix
Either side fixes the symptom, both together fix it cleanly:
- Agent side: route
collectTools errors through the same once-per-streak guard the start path uses. StartableToolSet already tracks pendingWarning for Start() failures; extend it (or add a sibling pendingListWarning) so a repeated Tools() failure for an already-started-but-now-broken toolset only surfaces once until it recovers.
- Notification side: in
Manager.Update's ShowMsg case, if a persistent notification with identical Text is already present, drop the new one (or bump a counter "× N" appended to the existing card). Cheap, defensive, and useful for any future caller that emits duplicate warnings.
I'd recommend doing both: the agent fix is the right primary, the notification fix is a safety net.
Repro environment
- macOS, Apple silicon, 64 GB
docker-agent HEAD as of 2026-05-22
- Multi-agent config with several remote MCPs (Notion SSE, an internal streamable MCP)
- Sessions of several hours; warnings observed across multiple agents (
mark_iv, root)
Related
Summary
When a remote MCP toolset enters a persistent failure state (e.g. an SSE endpoint returning
toolset not started), the TUI accumulates a new persistent warning notification on the right side of the screen on every conversation turn. After a long session the cards cover most of the viewport — each must be dismissed individually with[x].This is distinct from #2861 / #2866: #2861 is the heap-leak root cause of the jetsam kill, this one is a UX bug that surfaces every time MCP fails repeatedly. They can both bite the same session (mine did).
Reproduction
Tools()— e.g. ansseserver that has accepted Initialize but later returnslifecycle.ErrNotStartedon every list, or a transport that flaps.Some toolsets failed to initialize for agent '<name>'.\n\nDetails:\n\n- mcp(remote host=… transport=sse) list failed: toolset not started [x]card.Observed across three of my real sessions (screenshots in the linked DM): 5–7 identical stacked cards, several with two repeated lines inside the same card, after a few hours of activity.
Root cause
Two cooperating gaps:
1. No once-per-streak guard on the "list failed" path —
pkg/agent/agent.go:321:The "start failed" path at
agent.go:361andruntime.go:1152correctly gates emission throughStartableToolSet.ShouldReportFailure()(which returnstrueexactly once per failure streak). The "list failed" path skips this guard, so every iteration ofcollectToolsre-emits a fresh warning for the same underlying problem.2. No dedup in the notification manager —
pkg/tui/components/notification/notification.go:131:ShowMsgis appended unconditionally; persistent items (TypeWarning/TypeError) never auto-expire (persistent()returnstrue, notea.Tickscheduled), so identical cards stack until the user clicks each[x].The combination produces N copies of the same warning per failing toolset per session.
Proposed fix
Either side fixes the symptom, both together fix it cleanly:
collectToolserrors through the same once-per-streak guard the start path uses.StartableToolSetalready trackspendingWarningforStart()failures; extend it (or add a siblingpendingListWarning) so a repeatedTools()failure for an already-started-but-now-broken toolset only surfaces once until it recovers.Manager.Update'sShowMsgcase, if a persistent notification with identicalTextis already present, drop the new one (or bump a counter "× N" appended to the existing card). Cheap, defensive, and useful for any future caller that emits duplicate warnings.I'd recommend doing both: the agent fix is the right primary, the notification fix is a safety net.
Repro environment
docker-agentHEAD as of 2026-05-22mark_iv,root)Related