Skip to content

fix: keep background jobs alive across SessionEnd#355

Open
jnkliberty wants to merge 1 commit into
openai:mainfrom
jnkliberty:fix/background-jobs-survive-session-end
Open

fix: keep background jobs alive across SessionEnd#355
jnkliberty wants to merge 1 commit into
openai:mainfrom
jnkliberty:fix/background-jobs-survive-session-end

Conversation

@jnkliberty
Copy link
Copy Markdown

The bug

--background on task / review is supposed to mean "outlive the session that dispatched me", but session-lifecycle-hook.mjs:cleanupSessionJobs unconditionally kills every running job belonging to the ending session — including the detached worker that enqueueBackgroundTask just spawned.

Reproduces reliably when a Claude Code subagent dispatches `codex-rescue` (or any `task --background` / `review --background`): the subagent's turn ends → Claude Code fires SessionEnd → the cleanup hook walks `state.jobs` filtered by `sessionId`, calls `terminateProcessTree(pid)` on the detached worker, then `sendBrokerShutdown` + `teardownBrokerSession` for good measure. The worker dies a few seconds in, transcript freezes at the SIGTERM, and the parent session's later `/codex:status` returns nothing.

Foreground mode is unaffected because stdout stays open in the dispatching turn.

The fix

Three small, narrowly-scoped changes:

  1. `codex-companion.mjs:enqueueBackgroundTask` — tag the queued record with `background: true`.

  2. `session-lifecycle-hook.mjs:cleanupSessionJobs` — skip background jobs when terminating processes, and preserve them in state so any session in the workspace can still poll for status/results. Foreground job cleanup is unchanged.

  3. `session-lifecycle-hook.mjs:handleSessionEnd` — when active background jobs are still in workspace state, defer `sendBrokerShutdown` + `teardownBrokerSession`. The broker outlives this session until the last background worker is done with it. (Without this, killing the broker would indirectly kill the workers anyway.)

`upsertJob` shallow-merges patches, so the `background: true` flag survives `status: "queued" → "running" → "completed"` transitions in `tracked-jobs.mjs` without touching that file.

Test

Adds a new runtime test `session end preserves background jobs and their broker so workers survive their dispatching session` that:

  • Creates one foreground and one background job under the same sessionId, each backed by a real `setInterval` sleeper process so termination is observable.
  • Drives the SessionEnd hook.
  • Asserts the foreground sleeper is killed and its state entry pruned, while the background sleeper is still alive, its state entry is preserved, and its log/job files are not unlinked.

The existing `session end fully cleans up jobs for the ending session` test still passes — foreground cleanup behavior is unchanged.

Test results

`npm test` shows the same 4 pre-existing failures on this branch as on `main` (status/result/state tests unrelated to this change). All 83 previously-passing tests still pass, plus the new test.

Note

There is still no completion-notification back-channel from a background worker to the parent session — that's a separate, larger gap. For now this fix lets the documented "poll with `/codex:status`" flow actually work; before this fix, the worker was dead by the time the parent polled.

`--background` is supposed to mean "outlive the session that started me",
but `session-lifecycle-hook.mjs:cleanupSessionJobs` was unconditionally
terminating every running job belonging to the ending session — including
the detached worker that `enqueueBackgroundTask` had just spawned.

The result: any background task dispatched from a subagent (whose
SessionEnd fires as soon as its turn finishes) was SIGTERM'd a few seconds
in. The JSONL transcript froze at the kill timestamp, the parent session's
later status probe found no live job, and the caller never got a result.
Same hook then also tore down the broker the worker depended on.

Fix:
* Tag jobs created via `enqueueBackgroundTask` with `background: true`.
* `cleanupSessionJobs` now skips termination for background jobs and
  preserves their entry in `state.json` so any session in the workspace
  can still poll for status / fetch results.
* `handleSessionEnd` defers broker shutdown when active background jobs
  are still present in the workspace — the broker outlives this session
  until the last background worker is done with it.

Adds a runtime test that mixes a background and a foreground job under
the same sessionId, drives SessionEnd, and asserts the foreground worker
is killed + its state pruned while the background worker stays alive and
remains in state.
@jnkliberty jnkliberty requested a review from a team May 28, 2026 18:53
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e494a8e9aa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +125 to +126
if (hasActiveBackgroundJobs(cwd)) {
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Tear down the broker after background workers finish

When a SessionEnd occurs while a background job is still queued/running, this early return skips sendBrokerShutdown, teardownBrokerSession, and clearBrokerSession; I checked the worker path (handleTaskWorker) and the only other teardown call is broker replacement in ensureBrokerSession, so the comment’s “workers themselves” cleanup does not exist. In the common subagent case this means that after the background task completes, if no later SessionEnd runs for this workspace, the broker process/socket/temp dir and broker.json are left behind indefinitely.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant