Skip to content

fix: reduce retained tool output memory#2854

Open
dgageot wants to merge 4 commits into
docker:mainfrom
dgageot:board/e34279a60727904e
Open

fix: reduce retained tool output memory#2854
dgageot wants to merge 4 commits into
docker:mainfrom
dgageot:board/e34279a60727904e

Conversation

@dgageot
Copy link
Copy Markdown
Member

@dgageot dgageot commented May 21, 2026

  • wait for lifecycle supervisor watcher goroutines during shutdown
  • bound retained tool output in session history with a regression test
  • spool large MCP media payloads to disk instead of retaining inline base64
  • slim retained TUI tool results and avoid duplicating file contents in metadata

@dgageot dgageot requested a review from a team as a code owner May 21, 2026 12:13
c.addMessage(&msg)
}

// buildMultiContent attaches inline images to a tool response as a

This comment was marked as outdated.

Comment thread pkg/runtime/toolexec/dispatcher.go Outdated
// buildMultiContent attaches inline images to a tool response as a
// MultiContent payload, following the data-URL convention expected by
// chat clients.
func retainedToolOutput(output string, maxOldToolCallTokens int) string {

This comment was marked as outdated.

@dgageot dgageot marked this pull request as draft May 21, 2026 12:23
@docker-agent

This comment was marked as low quality.

@dgageot dgageot force-pushed the board/e34279a60727904e branch from 2104c15 to bb2f838 Compare May 21, 2026 13:44
@dgageot dgageot marked this pull request as ready for review May 21, 2026 13:56
@docker-agent
Copy link
Copy Markdown

PR Review Failed — The review agent encountered an error and could not complete the review. View logs.

@aheritier aheritier added area/agent For work that has to do with the general agent loop/agentic features of the app kind/fix PR fixes a bug (maps to fix: commit prefix) area/tools For features/issues/fixes related to the usage of built-in and MCP tools area/tui For features/issues/fixes related to the TUI labels May 22, 2026
Copy link
Copy Markdown
Contributor

@aheritier aheritier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR attacks four distinct memory sources and the implementation is mostly clean. The WithoutPayload slimming, the watchDone shutdown guarantee, and the ReadFileMeta.Content removal are all sound. One blocking issue must be resolved before merge.

Blocking

Disk leak: spooled MCP media files are never cleaned up.
writeMediaFile creates temp directories under os.TempDir() with prefix docker-agent-mcp-media-* but nothing removes them when the session ends or the ToolCallResult is GC'd. The PR itself calls defer os.RemoveAll(filepath.Dir(img.FilePath)) in the test, which highlights the production gap. Over a long session with many large media responses this trades the memory leak for a disk leak.

Fix options:

  • Register a cleanup func on the GatewayToolset lifecycle (cf. cleanUp pattern already used there), called on Stop.
  • Use a single session-scoped temp directory created once at Toolset construction and removed atomically on teardown — simpler than one dir per payload and easier to reason about.

Non-blocking

  1. Double WithoutPayload() in reasoningblock.gomessages.go already calls msg.Result.WithoutPayload() before passing the result to UpdateToolResult; reasoningblock.go then calls result.WithoutPayload() again internally. Harmless (idempotent on a stripped result) but confusing. Choose one call site and remove the other.

  2. Missing test: disk-write fallback in encodeMedia — the slog.Warn + inline fallback branch is untested. Worth covering with an injected error (e.g. a package-level writeMediaFile var that tests can replace).

  3. Missing test: threshold boundaryTestProcessMCPContentSpoolsLargeMedia only checks maxInlineMediaBytes+1. Add a case for exactly maxInlineMediaBytes bytes → inline, and maxInlineMediaBytes+1 → spooled.

  4. TestSupervisor_StopWaitsForWatcher is weak coverage for the concurrency fix — the test exercises the sequential path (Start → Stop), not the concurrent-Stop path that the s.stopping guard is specifically designed to handle. A test calling Stop twice concurrently from two goroutines while the watcher is live would be a stronger regression guard. The < time.Second timing assertion is also brittle under heavy CI load; goleak would be more idiomatic.

  5. readImageFile in filesystem.go still inlines large local images — up to chat.MaxImageBytes (~4.5 MB) as base64. This is inconsistent with the new MCP spooling strategy. Out of scope for this PR, but worth a follow-up issue.

Comment thread pkg/tools/mcp/mcp.go

func writeMediaFile(data []byte, mimeType string) (string, error) {
dir, err := os.MkdirTemp("", "docker-agent-mcp-media-*")
if err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BLOCKING] Disk leak — these temp directories are never removed.

os.MkdirTemp creates a directory in os.TempDir() on every call to writeMediaFile, but nothing cleans it up when the ToolCallResult is discarded or the session ends. Over a long session with many large media tool responses this will exhaust disk space.

Suggested fix: introduce a session-scoped temp directory (created once, removed on Stop) instead of a per-payload directory. Alternatively, register a cleanup closure on the Toolset lifecycle the same way GatewayToolset does today.

Comment thread pkg/tools/mcp/mcp_test.go
assert.Equal(t, "image/png", img.MimeType)
require.NotEmpty(t, img.FilePath)
defer os.RemoveAll(filepath.Dir(img.FilePath))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test correctly cleans up with defer os.RemoveAll(...), but this highlights that the production code does not. This defer will be unnecessary once a lifecycle cleanup hook is added on the production side.

entry.msg.ToolStatus = status
entry.msg.ToolResult = result
entry.msg.ToolResult = result.WithoutPayload()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Non-blocking] Double WithoutPayload() call.

The caller in messages.go already passes msg.Result.WithoutPayload() here, so result is already a stripped copy. Calling result.WithoutPayload() a second time is idempotent but redundant and may confuse future readers. Either remove WithoutPayload() from this line (since it's already stripped at the call site), or remove it from the call site in messages.go and let UpdateToolResult own the stripping — but not both.

start := time.Now()
assert.NilError(t, s.Stop(t.Context()))
assert.Check(t, time.Since(start) < time.Second)
assert.Check(t, is.Equal(s.State().State, lifecycle.StateStopped))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Non-blocking] Brittle timing assertion and weak concurrency coverage.

This < time.Second bound will be flaky under heavy CI load. More importantly, the test only covers the sequential Start→Stop path, not the concurrent-Stop path that the s.stopping guard was added to fix. A test with two goroutines calling Stop concurrently while the watcher is alive (e.g. blocked in sess.Wait()) would be a stronger regression guard. Consider using goleak to assert no goroutines are left behind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/agent For work that has to do with the general agent loop/agentic features of the app area/tools For features/issues/fixes related to the usage of built-in and MCP tools area/tui For features/issues/fixes related to the TUI kind/fix PR fixes a bug (maps to fix: commit prefix)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants