fix(agent-core): recover interrupted tool exchanges#273
Conversation
🦋 Changeset detectedLatest commit: 9d7eae9 The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9d7eae9796
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // before the loop could finish pairing recorded tool calls. | ||
| const missingToolResultIds = this.openSteps.size > 0 ? [...this.pendingToolResultIds] : []; | ||
| for (const toolCallId of missingToolResultIds) { | ||
| this.appendLoopEvent({ |
There was a problem hiding this comment.
Include recovered messages in replay output
When the first resume repairs an interrupted tool exchange, this call runs after AgentRecords.restore has cleared records.restoring; ReplayBuilder.push only records messages while restoring, so the synthetic tool result created here—and any deferred user/background messages flushed immediately afterward—are absent from ResumedAgentState.replay. The TUI renders that replay on resume, so the first resume after a crash can still show an unresolved tool call and omit the queued continuation even though context and persistence were repaired; the transcript only looks correct after a second resume from the newly appended record.
Useful? React with 👍 / 👎.
| // A sealed step can intentionally keep waiting for async tool output across | ||
| // context operations. An open step at replay EOF means the process stopped | ||
| // before the loop could finish pairing recorded tool calls. | ||
| const missingToolResultIds = this.openSteps.size > 0 ? [...this.pendingToolResultIds] : []; |
There was a problem hiding this comment.
Recover pending calls after compaction clears the step
If manual or auto compaction completes after a tool.call is recorded but before its tool.result, context.applyCompaction preserves the assistant/tool-call message but clears openSteps; after a crash at that point, replay reaches EOF with pendingToolResultIds still populated and openSteps.size === 0, so this guard synthesizes no result. The next model request still contains the compacted history with an assistant tool call and no matching tool message, reproducing the provider 400 this recovery is meant to prevent.
Useful? React with 👍 / 👎.
Related Issue
Resolve #269
Problem
If Kimi Code is force-closed after a
tool.callis recorded but before the matchingtool.resultandstep.endare written, session replay rebuilds an assistant message with unresolvedtool_calls. The next LLM request then fails with a provider-side 400 because the replayed message history has no matching tool response.What changed
tool.result, then deferred user/background messages are released back into history.tool.result.@moonshot-ai/agent-coreand@moonshot-ai/kimi-code.Verification
pnpm --filter @moonshot-ai/agent-core exec vitest run test/agent/records/index.test.ts test/agent/compaction/full.test.tspnpm --filter @moonshot-ai/agent-core run typecheckpnpm exec oxlint packages/agent-core/src/agent/context/index.ts packages/agent-core/src/agent/records/index.ts packages/agent-core/test/agent/records/index.test.tspnpm --filter @moonshot-ai/agent-core run testpnpm --filter @moonshot-ai/kimi-code run typecheckpnpm --filter @moonshot-ai/kimi-code run testpnpm changeset status --since=origin/mainChecklist
gen-changesetsskill, or this PR needs no changeset.gen-docsskill, or this PR needs no doc update.