fix: keep background jobs alive across SessionEnd#355
Conversation
`--background` is supposed to mean "outlive the session that started me", but `session-lifecycle-hook.mjs:cleanupSessionJobs` was unconditionally terminating every running job belonging to the ending session — including the detached worker that `enqueueBackgroundTask` had just spawned. The result: any background task dispatched from a subagent (whose SessionEnd fires as soon as its turn finishes) was SIGTERM'd a few seconds in. The JSONL transcript froze at the kill timestamp, the parent session's later status probe found no live job, and the caller never got a result. Same hook then also tore down the broker the worker depended on. Fix: * Tag jobs created via `enqueueBackgroundTask` with `background: true`. * `cleanupSessionJobs` now skips termination for background jobs and preserves their entry in `state.json` so any session in the workspace can still poll for status / fetch results. * `handleSessionEnd` defers broker shutdown when active background jobs are still present in the workspace — the broker outlives this session until the last background worker is done with it. Adds a runtime test that mixes a background and a foreground job under the same sessionId, drives SessionEnd, and asserts the foreground worker is killed + its state pruned while the background worker stays alive and remains in state.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e494a8e9aa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (hasActiveBackgroundJobs(cwd)) { | ||
| return; |
There was a problem hiding this comment.
Tear down the broker after background workers finish
When a SessionEnd occurs while a background job is still queued/running, this early return skips sendBrokerShutdown, teardownBrokerSession, and clearBrokerSession; I checked the worker path (handleTaskWorker) and the only other teardown call is broker replacement in ensureBrokerSession, so the comment’s “workers themselves” cleanup does not exist. In the common subagent case this means that after the background task completes, if no later SessionEnd runs for this workspace, the broker process/socket/temp dir and broker.json are left behind indefinitely.
Useful? React with 👍 / 👎.
The bug
--backgroundontask/reviewis supposed to mean "outlive the session that dispatched me", butsession-lifecycle-hook.mjs:cleanupSessionJobsunconditionally kills every running job belonging to the ending session — including the detached worker thatenqueueBackgroundTaskjust spawned.Reproduces reliably when a Claude Code subagent dispatches `codex-rescue` (or any `task --background` / `review --background`): the subagent's turn ends → Claude Code fires SessionEnd → the cleanup hook walks `state.jobs` filtered by `sessionId`, calls `terminateProcessTree(pid)` on the detached worker, then `sendBrokerShutdown` + `teardownBrokerSession` for good measure. The worker dies a few seconds in, transcript freezes at the SIGTERM, and the parent session's later `/codex:status` returns nothing.
Foreground mode is unaffected because stdout stays open in the dispatching turn.
The fix
Three small, narrowly-scoped changes:
`codex-companion.mjs:enqueueBackgroundTask` — tag the queued record with `background: true`.
`session-lifecycle-hook.mjs:cleanupSessionJobs` — skip background jobs when terminating processes, and preserve them in state so any session in the workspace can still poll for status/results. Foreground job cleanup is unchanged.
`session-lifecycle-hook.mjs:handleSessionEnd` — when active background jobs are still in workspace state, defer `sendBrokerShutdown` + `teardownBrokerSession`. The broker outlives this session until the last background worker is done with it. (Without this, killing the broker would indirectly kill the workers anyway.)
`upsertJob` shallow-merges patches, so the `background: true` flag survives `status: "queued" → "running" → "completed"` transitions in `tracked-jobs.mjs` without touching that file.
Test
Adds a new runtime test `session end preserves background jobs and their broker so workers survive their dispatching session` that:
The existing `session end fully cleans up jobs for the ending session` test still passes — foreground cleanup behavior is unchanged.
Test results
`npm test` shows the same 4 pre-existing failures on this branch as on `main` (status/result/state tests unrelated to this change). All 83 previously-passing tests still pass, plus the new test.
Note
There is still no completion-notification back-channel from a background worker to the parent session — that's a separate, larger gap. For now this fix lets the documented "poll with `/codex:status`" flow actually work; before this fix, the worker was dead by the time the parent polled.