fix: reduce retained tool output memory#2854
Conversation
| c.addMessage(&msg) | ||
| } | ||
|
|
||
| // buildMultiContent attaches inline images to a tool response as a |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
| // buildMultiContent attaches inline images to a tool response as a | ||
| // MultiContent payload, following the data-URL convention expected by | ||
| // chat clients. | ||
| func retainedToolOutput(output string, maxOldToolCallTokens int) string { |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
This comment was marked as low quality.
This comment was marked as low quality.
2104c15 to
bb2f838
Compare
|
❌ PR Review Failed — The review agent encountered an error and could not complete the review. View logs. |
aheritier
left a comment
There was a problem hiding this comment.
The PR attacks four distinct memory sources and the implementation is mostly clean. The WithoutPayload slimming, the watchDone shutdown guarantee, and the ReadFileMeta.Content removal are all sound. One blocking issue must be resolved before merge.
Blocking
Disk leak: spooled MCP media files are never cleaned up.
writeMediaFile creates temp directories under os.TempDir() with prefix docker-agent-mcp-media-* but nothing removes them when the session ends or the ToolCallResult is GC'd. The PR itself calls defer os.RemoveAll(filepath.Dir(img.FilePath)) in the test, which highlights the production gap. Over a long session with many large media responses this trades the memory leak for a disk leak.
Fix options:
- Register a cleanup func on the
GatewayToolsetlifecycle (cf.cleanUppattern already used there), called onStop. - Use a single session-scoped temp directory created once at
Toolsetconstruction and removed atomically on teardown — simpler than one dir per payload and easier to reason about.
Non-blocking
-
Double
WithoutPayload()inreasoningblock.go—messages.goalready callsmsg.Result.WithoutPayload()before passing the result toUpdateToolResult;reasoningblock.gothen callsresult.WithoutPayload()again internally. Harmless (idempotent on a stripped result) but confusing. Choose one call site and remove the other. -
Missing test: disk-write fallback in
encodeMedia— theslog.Warn+ inline fallback branch is untested. Worth covering with an injected error (e.g. a package-levelwriteMediaFilevar that tests can replace). -
Missing test: threshold boundary —
TestProcessMCPContentSpoolsLargeMediaonly checksmaxInlineMediaBytes+1. Add a case for exactlymaxInlineMediaBytesbytes → inline, andmaxInlineMediaBytes+1→ spooled. -
TestSupervisor_StopWaitsForWatcheris weak coverage for the concurrency fix — the test exercises the sequential path (Start → Stop), not the concurrent-Stop path that thes.stoppingguard is specifically designed to handle. A test callingStoptwice concurrently from two goroutines while the watcher is live would be a stronger regression guard. The< time.Secondtiming assertion is also brittle under heavy CI load;goleakwould be more idiomatic. -
readImageFileinfilesystem.gostill inlines large local images — up tochat.MaxImageBytes(~4.5 MB) as base64. This is inconsistent with the new MCP spooling strategy. Out of scope for this PR, but worth a follow-up issue.
|
|
||
| func writeMediaFile(data []byte, mimeType string) (string, error) { | ||
| dir, err := os.MkdirTemp("", "docker-agent-mcp-media-*") | ||
| if err != nil { |
There was a problem hiding this comment.
[BLOCKING] Disk leak — these temp directories are never removed.
os.MkdirTemp creates a directory in os.TempDir() on every call to writeMediaFile, but nothing cleans it up when the ToolCallResult is discarded or the session ends. Over a long session with many large media tool responses this will exhaust disk space.
Suggested fix: introduce a session-scoped temp directory (created once, removed on Stop) instead of a per-payload directory. Alternatively, register a cleanup closure on the Toolset lifecycle the same way GatewayToolset does today.
| assert.Equal(t, "image/png", img.MimeType) | ||
| require.NotEmpty(t, img.FilePath) | ||
| defer os.RemoveAll(filepath.Dir(img.FilePath)) | ||
|
|
There was a problem hiding this comment.
The test correctly cleans up with defer os.RemoveAll(...), but this highlights that the production code does not. This defer will be unnecessary once a lifecycle cleanup hook is added on the production side.
| entry.msg.ToolStatus = status | ||
| entry.msg.ToolResult = result | ||
| entry.msg.ToolResult = result.WithoutPayload() | ||
|
|
There was a problem hiding this comment.
[Non-blocking] Double WithoutPayload() call.
The caller in messages.go already passes msg.Result.WithoutPayload() here, so result is already a stripped copy. Calling result.WithoutPayload() a second time is idempotent but redundant and may confuse future readers. Either remove WithoutPayload() from this line (since it's already stripped at the call site), or remove it from the call site in messages.go and let UpdateToolResult own the stripping — but not both.
| start := time.Now() | ||
| assert.NilError(t, s.Stop(t.Context())) | ||
| assert.Check(t, time.Since(start) < time.Second) | ||
| assert.Check(t, is.Equal(s.State().State, lifecycle.StateStopped)) |
There was a problem hiding this comment.
[Non-blocking] Brittle timing assertion and weak concurrency coverage.
This < time.Second bound will be flaky under heavy CI load. More importantly, the test only covers the sequential Start→Stop path, not the concurrent-Stop path that the s.stopping guard was added to fix. A test with two goroutines calling Stop concurrently while the watcher is alive (e.g. blocked in sess.Wait()) would be a stronger regression guard. Consider using goleak to assert no goroutines are left behind.
Uh oh!
There was an error while loading. Please reload this page.