refactor: background task manager persistence#285
Conversation
🦋 Changeset detectedLatest commit: 0473cca The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
commit: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 98f49657f9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (this.activeTaskCount() < maxRunningTasks) return; | ||
| throw new Error('Too many background tasks are already running.'); |
There was a problem hiding this comment.
Reserve capacity before launching background work
When background.maxRunningTasks is set, this check now only happens inside registerTask(), but both background call sites perform async work first (BashTool spawns the process and AgentTool spawns/resumes the subagent before registering). If the limit is already reached, the command/subagent still starts and can perform side effects before registerTask() throws and the catch path tries to abort/kill it, so the configured cap no longer prevents over-limit background work from launching. Please restore a pre-launch reservation/check or otherwise enforce the limit before starting the process/subagent.
Useful? React with 👍 / 👎.
| function isReadablePersistedTask(obj: unknown): obj is DiskPersistedTask { | ||
| return ( | ||
| isRecord(obj) && | ||
| (typeof obj['taskId'] === 'string' || typeof obj['task_id'] === 'string') | ||
| ); |
There was a problem hiding this comment.
Reject malformed persisted task records
On resume, any syntactically valid JSON file under tasks/ that contains only a taskId (or a legacy task_id) now passes this guard, even if fields like kind, description, status, and timestamps are missing or optional legacy fields have incompatible types. listTasks() then returns that malformed object as a BackgroundTaskInfo, and reconcile() can mark it lost, persist it, and emit notifications with undefined metadata instead of silently dropping the corrupt/incompatible record as the persistence comment describes. Please validate the full current/legacy shapes before normalizing.
Useful? React with 👍 / 👎.
| if (timeoutMs !== undefined && timeoutMs > 0) { | ||
| raceInputs.push(sleep(timeoutMs).then(() => deadlineTimeout)); | ||
| } |
There was a problem hiding this comment.
Clear the agent deadline timer after completion
When a background agent has timeoutMs and its completion wins this race, the sleep(timeoutMs) timer remains pending because nothing cancels it in the finally path. That can keep the Node process alive until the full background timeout even though the task already completed or was stopped; the previous implementation kept the timeout handle and cleared it in finally. Please use a cancellable/clearable timer for the deadline branch.
Useful? React with 👍 / 👎.
| this.agent.telemetry.track('background_task_completed', { | ||
| kind: info.kind, | ||
| duration: info.endedAt !== null ? info.endedAt - info.startedAt : null, | ||
| status: info.status, |
There was a problem hiding this comment.
Preserve completion telemetry schema
For completed bash background tasks, this now reports kind: 'process' and a millisecond duration, while task creation still reports kind: 'bash' and the previous completion event used kind: 'bash' with duration_s. Any telemetry consumer joining created/completed events or reading the existing background_task_completed schema will stop matching bash completions after this refactor. Please keep the old bash/agent kind and seconds field (or emit compatible aliases).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0473cca202
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| "@moonshot-ai/agent-core": patch | ||
| "@moonshot-ai/kimi-code": patch |
There was a problem hiding this comment.
This changeset only records patch bumps for agent-core/kimi-code, but the same commit deletes the public Session.getBackgroundTaskOutputPath() method and its RPC plumbing from packages/node-sdk/src/session.ts. SDK consumers that call that method will break without any @moonshot-ai/kimi-code-sdk changeset or major-version/release-note signal, so either keep a compatibility wrapper or version the SDK change as a breaking API change.
Useful? React with 👍 / 👎.
| backgroundAgentMetadata.set(info.taskId, { | ||
| agentId: info.taskId, | ||
| agentId: info.agentId ?? info.taskId, | ||
| parentToolCallId: info.taskId, |
There was a problem hiding this comment.
Key resumed background agents by subagent id
When reconnecting to an already-active session with a background subagent still running, this map is later read with backgroundAgentMetadata.get(event.subagentId) in the subagent.completed/subagent.failed handlers, but it is populated under the background task id. The new agentId: info.agentId ?? info.taskId value makes the task id vs. subagent id split explicit, so completions after reconnect miss the background path and leave the background-agent badge/metadata stale; store the entry under info.agentId ?? info.taskId instead.
Useful? React with 👍 / 👎.
| const isTerminal = | ||
| info.status === 'completed' || | ||
| info.status === 'failed' || | ||
| info.status === 'timed_out' || |
There was a problem hiding this comment.
Preserve timed-out status after aborting subagent
For a background Agent task whose new timeoutMs deadline fires, AgentBackgroundTask aborts the subagent and immediately emits background.task.terminated with status: timed_out; treating that as terminal here marks the task as transcripted, but the aborted child then emits subagent.failed, whose handler still overwrites the Agent card with status: 'failed' and returns because the task was already transcripted. In this timeout path the user loses the timed-out state and no final background-agent transcript is appended, so the later abort failure should be ignored or mapped to timed_out when the task is already timed out.
Useful? React with 👍 / 👎.
Related Issue
No linked issue.
Problem
Background task persistence was split across process-oriented tool code and newer agent task behavior. The PR needs a single background task model that can represent process and agent tasks, while still reading task records written by older builds so users do not lose persisted background task state after upgrading.
What changed
bash-*process task records and legacy agent task records.packages/agent-core/test/agent/background, added shared test helpers, and updated tool/session/resume tests to use the new manager API.Validation
pnpm exec oxlint packages/agent-core/test/agent/background packages/agent-core/test/tools/background/task-tools.test.ts packages/agent-core/test/tools/bash.test.ts packages/agent-core/test/tools/agent.test.ts packages/agent-core/test/tools/builtin-current.test.ts packages/agent-core/test/session/lifecycle-hooks.test.ts packages/agent-core/test/agent/resume.test.ts packages/agent-core/test/agent/bg-idle-notification-repro.test.ts packages/agent-core/test/agent/harness/agent.tspnpm --filter @moonshot-ai/agent-core exec vitest run test/agent/background test/tools/background/task-tools.test.ts test/tools/bash.test.ts test/tools/agent.test.ts test/tools/builtin-current.test.ts test/session/lifecycle-hooks.test.ts test/agent/resume.test.ts test/agent/bg-idle-notification-repro.test.tspnpm exec oxlint packages/agent-core/test/agent/background packages/agent-core/test/tools/background/task-tools.test.ts packages/agent-core/test/tools/bash.test.ts packages/agent-core/test/tools/agent.test.ts packages/agent-core/test/tools/builtin-current.test.ts packages/agent-core/test/session/lifecycle-hooks.test.ts packages/agent-core/test/agent/resume.test.ts packages/agent-core/test/agent/bg-idle-notification-repro.test.ts packages/agent-core/test/agent/harness/agent.ts && pnpm --filter @moonshot-ai/agent-core build && git diff --checkpnpm --filter @moonshot-ai/agent-core testpnpm --filter @moonshot-ai/agent-core typecheckpnpm --filter @moonshot-ai/kimi-code-sdk run typecheckpnpm --filter @moonshot-ai/kimi-code-sdk exec vitest run test/session-background-tasks.test.ts test/session-event-types.test.tspnpm exec oxlint packages/node-sdk/examples/kimi-harness-prompt-demo.ts packages/node-sdk/examples/runtime-smoke-helpers.ts packages/node-sdk/test/session-background-tasks.test.ts packages/node-sdk/test/session-event-types.test.tspnpm run typecheckChecklist
gen-changesetsskill, or this PR needs no changeset.gen-docsskill, or this PR needs no doc update.