feat: per-job event stream + watchdog + compact recovery (1.1.0) by suminerProxy · Pull Request #2 · suminerProxy/codex-plugin-cc

suminerProxy · 2026-05-19T10:01:27Z

Why

When Claude Code's main loop delegates a coding task to Codex via `/codex:rescue`, the subagent makes a single synchronous `Bash` call into the companion `task` command and waits for it to return. If Codex stalls (prompt too long, model hang, tool deadlock), the main loop stalls too — there is no out-of-band signal it can poll to detect the freeze.

This PR adds the missing channel: a per-job NDJSON event stream the main Claude loop can poll with `/codex:events`, plus a stall watchdog that emits `phase:"stuck"` when codex goes quiet, and a protocol-native `/codex:compact` recovery path for context overflow. `/codex:rescue` now defaults to `--background` so the main loop never blocks on a synchronous Bash call.

What ships in 1.1.0

`/codex:events ` — incremental polling of normalized codex notifications (`--since` / `--after-seq` / `--limit` / `--json`).
`/codex:compact ` — wraps `thread/compact/start` for the cancel → compact → resume-with-amended-prompt flow.
`/codex:rescue` default flipped to `--background`.
60s per-job stall watchdog (override via `CODEX_COMPANION_STALL_SECONDS`) emits `{type:"watchdog", phase:"stuck"}`. Does not cancel.
New `{type:"job/exited"}` terminal event with `phase: completed|failed`. Fixes a bug where codex turn failures were misreported as `phase:"completed"`.
Top-level `usage` on `runAppServerTurn`; real-time updates via `thread/tokenUsage/updated` events (`phase:"metering"`).
codex CLI 0.131 notification coverage extended (`thread/status/changed`, `warning`, `thread/tokenUsage/updated`, `userMessage`, `assistantMessage`/`agentMessage`, `reasoning`). E2E shows `unknown` phase count goes from 5/9 (56%) to 0/13.

Includes #1 (test isolation) as a prerequisite

This branch contains 8 commits, the first of which (`5e8100a`) is the same fix as #1. Merging this PR alone brings both fixes in; merging #1 first and then this PR is also fine (git will skip the already-applied commit).

Verification

✅ `npm test`: 128 / 128 pass (baseline 86 + 42 new tests).
✅ Success path E2E against real codex 0.131.0-alpha.9: 13 events, `job/exited phase:"completed" exitCode:0`, 0 unknown-phase events.
✅ Failed path E2E: `--effort minimal` → `invalid_request_error` → `job/exited phase:"failed" exitCode:1 errorMessage:"Task did not complete successfully (...)"`. Stored `state.json` matches.
✅ Stall watchdog: `CODEX_COMPANION_STALL_SECONDS=5` + xhigh reasoning → `{type:"watchdog", phase:"stuck", stallMs:8540}` after a Reasoning item went silent 8.54s. Did not cancel, did not re-fire spuriously when subsequent ~2s quiet periods occurred.
✅ Compact rejection path: `/codex:compact ` → `attempted:true, compacted:false, detail:"invalid thread id: ..."`.

Open schema unknowns (do not block merge)

`thread/compact/start` success payload shape is undocumented for codex CLI 0.131. Wrapper preserves it verbatim under `.result`.
`thread/tokenUsage/updated` payload field names not confirmed. Normalize tries several aliases; falls back to a stable label. The event still fires correctly (`phase:"metering"`).

Build status

Pre-existing upstream incompatibility with codex CLI 0.131 (`requestAttestation` / `experimentalRawEvents` schema drift) makes `npm run build` fail on `main` too. Not introduced by this PR; intentionally not addressed to keep the diff focused.

(Previously opened as openai#335 — closed in favor of this fork-internal PR so the repo owner can self-merge.)

Summary by CodeRabbit

Release Notes v1.1.0

New Features
- Added /codex:events command for streaming per-job notifications and progress monitoring
- Added /codex:compact command for context overflow recovery
- Token usage now surfaced as a top-level field with real-time streaming updates
Bug Fixes
- /codex:rescue now defaults to background execution to prevent deadlocks
- Added stall watchdog monitoring for job health
Documentation
- Updated changelog with new command and feature details
- Added guidance for background job monitoring via event polling

Host plugin runtimes (e.g. Claude Code) inject CLAUDE_PLUGIN_DATA and CODEX_COMPANION_SESSION_ID into the plugin process. When these leak into `npm test`, two tests fail with non-bug assertions: - tests/state.test.mjs: "resolveStateDir uses a temp-backed per-workspace directory" — CLAUDE_PLUGIN_DATA redirects state to ~/.claude/plugins/data instead of os.tmpdir(). - tests/runtime.test.mjs: "result without a job id ..." — fixture writer and the spawned `result` subprocess look at different state dirs. CI passes (no host-runtime env), so this only bites contributors running `npm test` from inside a plugin host. Fix: delete these env vars at the top of tests/helpers.mjs (the shared import) so every test gets a clean slate. Tests that exercise the env var explicitly ("resolveStateDir uses CLAUDE_PLUGIN_DATA when it is provided") already use try/finally to set and restore it, so they continue to pass. Verified: 86/86 tests pass with the polluting env vars still set in the parent shell.

Add resolveJobEventsFile / appendJobEvent / readJobEvents for the upcoming Claude-main-loop observability work (see DESIGN doc at user planning area). Contract: - Per-job events file at {stateDir}/jobs/{jobId}.events.ndjson. - appendJobEvent caps each line at 4KB. POSIX `write(fd, buf, n)` with O_APPEND is atomic when n < PIPE_BUF (~4096 on Linux/macOS), so single- write append is safe across concurrent readers. Oversized lines first truncate the `raw` field; if still too big, the whole event is replaced with a minimal {type: "oversize-event-elided", seq, ts, method, phase}. - readJobEvents returns [] for missing files. Partial last line (writer mid-write before the trailing \n) is tolerated — JSON.parse failure on the trailing line is skipped, next read picks it up. afterSeq takes precedence over since when both are passed. This commit is consumer-less; the producer side (captureTurn onNotification hook) lands in the next commit so this slice can ship green independently. Tests: 10 new unit tests covering path resolution, append+read, missing file, afterSeq filter, since filter, afterSeq precedence over since, limit, partial-line tolerance, raw truncation, and oversize elision. 96/96 pass.

Three changes to codex.mjs, building on the per-job NDJSON event API: 1. New exported `normalizeNotification(state, message)` function — pure transform from app-server notifications to the flat event shape that gets appended to {jobId}.events.ndjson. Covers thread/started, thread/name/updated, turn/started, item/started, item/completed, error, turn/completed, and an "unknown" fallback for forward-compat methods (e.g. thread/compact/started). Phase inference reuses the existing describeStartedItem / describeCompletedItem maps so the vocabulary stays consistent with on-screen progress text. 2. `captureTurn` accepts an `options.onNotification` callback. A small `dispatch(message)` wrapper applies state mutation first, then emits the normalized event. Order is load-bearing: normalize reads state.threadTurnIds populated by applyTurnNotification. onNotification errors are swallowed so a broken consumer can never crash the worker. 3. `runAppServerTurn` forwards `options.onNotification` to captureTurn and surfaces `usage` (from turnState.finalTurn?.usage) as a top-level field on the result, so /codex:status and the events stream can read token usage without traversing the nested `turn` payload. Tests: 13 new unit tests covering each method branch, phase inference edge cases (commandExecution -> running vs verifying), unknown-method fallback, and malformed-input resilience. 109/109 pass. No behavioral change for callers that don't pass onNotification — the hook is fully opt-in. The companion-side wiring lands separately.

End-to-end wiring of the observability path landed in the prior two commits (state.mjs events API + codex.mjs notification hook). In codex-companion.mjs: - executeTaskRun: forward `onNotification` through to runAppServerTurn. Sync foreground callers still don't pass one; only the background task-worker path opts in. - handleTaskWorker: build a same-process closure that (a) appends every normalized notification to {jobId}.events.ndjson with a monotonic per-job seq number; (b) reflects phase transitions into state.json via upsertJob (only on real phase change, to keep the unlocked single-flight writer rare); (c) runs a 5s stall watchdog — when the gap since the last event exceeds CODEX_COMPANION_STALL_SECONDS (default 60s), it emits one {type:"watchdog", phase:"stuck"} record and flips job phase to stuck. It does NOT cancel. Main-loop Claude decides what to do next (continue / compact / cancel). (d) on terminal exit (success OR failure), emits a single {type:"job/exited", phase:"completed"|"failed", exitCode, errorMessage} record so a polling reader can distinguish "still slow" from "already finished". The job.status state.json field is no longer the sole source of truth — main-loop Claude must look for the job/exited event. Observability errors are swallowed; they must never crash the worker. Watchdog interval is unref'd so it can't keep the event loop alive. - New `events <job-id> [--since|--after-seq|--limit] [--json]` subcommand delegating to readJobEvents. Default output is human-readable lines; --json returns {jobId, eventsFile, count, events}. After-seq + since semantics match readJobEvents (after-seq takes precedence). printUsage updated. Tests: 7 integration tests spawning the real CLI to verify the events command's empty-result, append-and-read, after-seq filtering, limit capping, human-readable rendering, and missing-jobId usage paths. 116/116 pass. The stall threshold is configurable via env var for now (CODEX_COMPANION_STALL_SECONDS). A `task --max-stall-seconds <N>` CLI flag is deferred to a later commit so handleTask's option parser stays untouched in this slice.

E2E against real codex CLI 0.131.0-alpha.9 uncovered three issues mock tests could not surface. Fixed all three; rerun confirms zero `unknown` phase events in a successful turn. 1. handleTaskWorker job/exited bug (correctness) runTrackedJob does NOT throw when codex returns a failed turn — it resolves with execution.exitStatus != 0 and writes state.status="failed" out-of-band. Previous code keyed `completed` off the absence of an exception, so a failed turn was misreported as `phase:"completed", exitCode:0` in the terminal job/exited event. This is the single record main-loop Claude reads to decide "did this work?" — getting it wrong led to silent false positives. Fix: inspect execution.exitStatus directly. Bug repro: dispatch with --effort minimal (which conflicts with web_search) — codex returns an invalid_request_error, runTrackedJob resolves with exitStatus=1, stored state.json shows status="failed", but the prior code emitted {phase:"completed", exitCode:0}. After fix: {phase:"failed", exitCode:1, errorMessage:"Task did not complete successfully ..."}. 2. normalize coverage gaps (forward-compat / readability) codex 0.131 emits three notification methods + three item types the prior switch did not recognize, leaving them as phase:"unknown" with generic fallback messages: - method `thread/status/changed` with status.type in {"idle"} → phase:"idle", message "Thread idle.". Post-turn quiescence signal. - method `thread/tokenUsage/updated` → phase:"metering" (new word in the phase vocab). Streaming token-usage source. This is the real event main-loop Claude should poll to detect context-budget pressure before turn/completed surfaces final usage. NOTE: codex 0.131 payload shape for the usage object is not documented; normalize tries {inputTokens,outputTokens,cachedInputTokens} and several aliases; when no recognized keys are found, falls back to a stable label and preserves raw. Concrete schema discovery is a Phase 3 follow-up. - item.type `agentMessage` / `assistantMessage` / `reasoning` → phase:"thinking". `agentMessage` is codex's final-reply item; describeStartedItem/describeCompletedItem now use a new extractItemText helper to surface a content preview ("Codex replied: pong") so main-loop Claude can recognize the answer from the event stream without fetching /codex:result. 3. Single E2E rerun verification Same prompt with --effort low (avoids the minimal+web_search constraint): 13 events emitted, phase distribution {thinking:8, completed:2, idle:1, metering:1, warning:1, unknown:0}. The prior normalize generated 5 unknown out of 9 events (56%); this commit brings it to 0/13 on the happy path. Tests: +6 new normalize unit tests (status idle, tokenUsage with usage, tokenUsage without usage, agentMessage with text, agentMessage with content[].text, schema fallback). 128/128 pass total. The job/exited fix is exercised in the existing fake-codex fixture indirectly; a direct unit test would need a fake runTrackedJob that simulates "resolved with non-zero exitStatus" which is non-trivial to mock — for now the E2E repro and the inline reasoning in the comment are the documented evidence.

Closes the "operate codex like a Claude subagent" half of the design contract. Phase 1+2 gave the main loop an event stream; Phase 3 gives it the slash-command surface and the protocol-native recovery path. In lib/codex.mjs: - New exported compactAppServerThread(cwd, {threadId}). Wraps codex app-server's thread/compact/start RPC. This is the protocol-native recovery for "prompt too long" — main-loop Claude calls it after a turn fails with context-overflow, then resumes via /codex:rescue --resume <amended prompt>. Fire-and-return: the call awaits the app-server's ack but does not consume the streaming response. The broker recognizes thread/compact/start as STREAMING_METHOD but routes the stream to whoever owns it at that moment; compaction itself completes on the codex side regardless of consumer presence. Uses reuseExistingBroker: true so it can punch through to an already-running broker if one exists. The exact success payload shape is undocumented in codex CLI 0.131; result is preserved verbatim under .result for forward-compat. In codex-companion.mjs: - New `compact <thread-id> [--json]` subcommand wrapping compactAppServerThread. Plain output prints the operation result and hints the resume flow; --json returns the full structured report. Smoke-tested against real codex 0.131: bogus thread id correctly returns attempted:true, compacted:false, with codex's own error ("invalid thread id") preserved under .detail. In agents/codex-rescue.md + commands/rescue.md: - Default execution mode flipped from foreground to background. The prior heuristic ("small bounded => foreground; complex => background") was the deadlock root cause: a small task that stalls is still a deadlock, and "small" is unknowable in advance. Background is the safe default — the main loop polls /codex:events <job-id> for progress instead of blocking on the synchronous Bash call. --wait is honored when the user explicitly asks for foreground. In commands/: - New events.md: slash command surfacing the per-job event stream. Documents the {type:"job/exited"} terminal-state contract, the phase:"stuck" watchdog signal, the phase:"metering" token-usage source, and the --after-seq incremental polling pattern. - New compact.md: slash command for the recovery sequence. Documents the typical "cancel → compact → resume with amended prompt" idiom main-loop Claude should follow when codex hits context overflow. In tests/commands.test.mjs: - Updated assertions to reflect the new background default and to include events.md + compact.md in the commands/ file-list invariant. The prior assertions hard-coded the prose "default to foreground" and the file list of 7 commands; both needed to track this change. Tests: 128/128 pass. No new tests added — the compact path is exercised indirectly via a smoke check (bogus thread id returns the expected structured error against the real app-server). A full E2E for compact needs a real turn first to obtain a valid thread id; that lands in Phase 4 alongside version bump + CHANGELOG.

Cuts the minor-version release for the observability rework (feat/event-stream-foundation). Adds /codex:events, /codex:compact, rescue defaults to --background, stall watchdog, terminal job/exited events, top-level token usage, and broader codex CLI 0.131 notification coverage. No breaking changes — all existing commands keep the same signatures and outputs; the new event stream is additive.

coderabbitai · 2026-05-19T10:01:38Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 14eabfba-cbe4-4900-ad63-ec0ab50c6728

📥 Commits

Reviewing files that changed from the base of the PR and between 807e03a and f9f36fb.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (16)

.claude-plugin/marketplace.json
package.json
plugins/codex/.claude-plugin/plugin.json
plugins/codex/CHANGELOG.md
plugins/codex/agents/codex-rescue.md
plugins/codex/commands/compact.md
plugins/codex/commands/events.md
plugins/codex/commands/rescue.md
plugins/codex/scripts/codex-companion.mjs
plugins/codex/scripts/lib/codex.mjs
plugins/codex/scripts/lib/state.mjs
tests/commands.test.mjs
tests/events-command.test.mjs
tests/helpers.mjs
tests/normalize-notification.test.mjs
tests/state.test.mjs

📝 Walkthrough

Walkthrough

This PR implements Codex plugin v1.1.0, introducing event-driven job observability via NDJSON streaming, context compaction for "prompt too long" recovery, and changes rescue defaults to non-blocking background execution with polling support. The release bumps manifests, adds event storage and normalization infrastructure, wires notifications through task execution, and exposes new CLI commands for monitoring and recovery.

Changes

Codex Plugin v1.1.0: Event Streaming, Context Compaction, and Job Lifecycle Observability

Layer / File(s)	Summary
Release versioning and changelog `.claude-plugin/marketplace.json`, `package.json`, `plugins/codex/.claude-plugin/plugin.json`, `plugins/codex/CHANGELOG.md`	Version bumped from 1.0.4 to 1.1.0 across all manifests. Changelog documents new observability features, background defaults, token usage export, and test isolation improvements.
Job event storage and NDJSON persistence `plugins/codex/scripts/lib/state.mjs`, `tests/state.test.mjs`	Implements per-job NDJSON event log with atomic single-line writes capped at 4KB, automatic truncation of oversized payloads, filtering by sequence number or timestamp, and full test coverage for append/read/filter operations.
Notification normalization contract and item descriptors `plugins/codex/scripts/lib/codex.mjs`, `tests/normalize-notification.test.mjs`	Exports `normalizeNotification()` to convert app-server notifications into stable event records. Updates item descriptors to recognize `userMessage`, `assistantMessage`, `agentMessage`, and `reasoning` types with rich progress messages. Includes 332 lines of tests covering all notification shapes and edge cases.
Notification dispatch in turn capture and token usage export `plugins/codex/scripts/lib/codex.mjs`	Extends `captureTurn()` to emit normalized notifications via optional `onNotification` callback before returning. Wires callback through `runAppServerTurn()`, adds top-level `usage` field to turn result, and exports `compactAppServerThread()` for context recovery requests to app-server.
Worker task runtime with event streaming, stall detection, and terminal events `plugins/codex/scripts/codex-companion.mjs`	Passes `onNotification` into task execution, maintains per-job event stream with sequence counting and `lastEventAt` timestamp tracking. Stall watchdog emits `{type:"watchdog", phase:"stuck"}` after 60s inactivity; terminal `{type:"job/exited"}` event based on exit status (not exceptions) signals completion to listeners.
Events and compact command handlers `plugins/codex/scripts/codex-companion.mjs`	Implements `handleEvents()` supporting `--since`, `--after-seq`, and `--limit` filtering with JSON or formatted line output. Implements `handleCompact()` for context recovery with resume instructions. Both routed via CLI dispatcher.
Command documentation and rescue default behavior `plugins/codex/agents/codex-rescue.md`, `plugins/codex/commands/compact.md`, `plugins/codex/commands/events.md`, `plugins/codex/commands/rescue.md`	Documents `/codex:events` with NDJSON field semantics and phase meanings. Documents `/codex:compact` with recovery sequence and output formats. Updates rescue and codex-rescue to default to `--background` execution with instructions to poll via events endpoint.
Test isolation, command tests, and events command coverage `tests/helpers.mjs`, `tests/commands.test.mjs`, `tests/events-command.test.mjs`	Removes host-injected `CLAUDE_PLUGIN_DATA` and `CODEX_COMPANION_SESSION_ID` from test environment. Updates command tests to expect new `compact.md` and `events.md` and background defaults. Adds 121-line events command test suite covering missing job-id, unknown jobs, incremental filtering, output formats, and result limits.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 A rabbit hops through event logs,
Watching jobs stream NDJSON fog,
When prompts grow long and Codex stalls,
We compact threads and poll through calls—
Observability for all! 🎉

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/event-stream-foundation

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

The 1.1.0 design doc's "可测验证 #2" calls for an integration test that dispatches a slow codex task and asserts the events stream surfaces `phase:"stuck"`. The behavior was previously verified only by manual E2E (stallMs=8540 ms in a real codex run); this commit adds CI-gated coverage against regressions. Approach: spawn task-worker against a fake codex that responds to thread/start + turn/start then deliberately stops sending notifications. With CODEX_COMPANION_STALL_SECONDS=1 the worker should emit a watchdog event within ~5-7 seconds (setInterval ticks every 5s). The test polls events.ndjson via readJobEvents (not the CLI, to bypass any rendering layer), and asserts the stuck record's stallMs meets the configured threshold. The job is explicitly cancelled at teardown and the broker is torn down via SessionEnd. Extends tests/fake-codex-fixture.mjs with a hang-after-turn-start behavior so future watchdog / timeout tests can reuse the same hang pattern.

bit-star added 7 commits May 19, 2026 15:59

suminerProxy merged commit c0a6926 into main May 19, 2026
1 check was pending

suminerProxy deleted the feat/event-stream-foundation branch May 19, 2026 10:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: per-job event stream + watchdog + compact recovery (1.1.0)#2

feat: per-job event stream + watchdog + compact recovery (1.1.0)#2
suminerProxy merged 7 commits into
mainfrom
feat/event-stream-foundation

suminerProxy commented May 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 19, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

suminerProxy commented May 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What ships in 1.1.0

Includes #1 (test isolation) as a prerequisite

Verification

Open schema unknowns (do not block merge)

Build status

Summary by CodeRabbit

Release Notes v1.1.0

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

suminerProxy commented May 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 19, 2026 •

edited

Loading