Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-05-18
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
## Why

Real-fleet telemetry from the 2026-05-18 `marketing-content-waves` bringup against the recodee repo surfaced six concrete dispatch-path defects that block workers from claiming Colony tasks even when the fleet is "physically up." Symptoms:

1. Stale dead panes from prior fleet runs linger in the overview chrome — workers terminated by `signal 15` show `Pane is dead` for hours and operators get no surfacing signal.
2. `cap-probe` cache TTL is stale across bringups: first run found 5/6 healthy accounts, fresh `--no-cap-cache` rerun ~5min later found 8/8 — the cache outlived the actual quota recovery.
3. The `wake-prompt` window stays blank on bringup completion — never auto-fires, so workers idle at default Codex placeholder prompts (`"Implement {feature}"`, `"Find and fix a bug in @filename"`).
4. `plan-watcher.sh` re-validates plan.json on each tick *without* passing `--allow-waves`, so any plan with `depends_on` fails hard, plan-watcher skips dispatch, and `force-claim` silently falls back to whatever plan is next in queue (we observed our priority plan being skipped while `trading-edge-foundations-pt2` got dispatched instead).
5. `force-claim` send-keys hits "not in a mode" on non-idle Codex panes and silently drops the dispatch — no retry, no backoff, no operator signal.
6. Even when send-keys reaches the input box, Codex's auto-submit doesn't fire — the prompt sits typed but never gets submitted. Context % drops (so the keys arrived) but no Colony claim is recorded.
7. **(observed live on FLEET_ID=3 bringup)** Bringup creates per-account CODEX_HOMEs under `/tmp/codex-fleet/<account>-<host>`. On Codex CLI first launch in a fresh home, three interactive prompts block the worker before it ever reaches the input box: `Do you trust the contents of this directory?` → `External agent config detected (Proceed with selected)` → optional `Press enter to continue`. **All 8 workers of FLEET_ID=3 stalled on these three-stage prompts**; force-claim, plan-watcher, and auto-wake all become no-ops because Codex itself hasn't reached its REPL yet. The operator currently has to click through every pane by hand.

These bugs compound: (4) blocks dispatch for plans with deps, (5) blocks dispatch for busy panes, (6) blocks dispatch *even when send-keys lands in the input box*, and **(7) prevents the input box from existing in the first place**. The net effect is that a freshly-bootstrapped fleet looks healthy in tmux but performs zero work.

## What Changes

- **F1 — surface dead panes**: `scripts/codex-fleet/show-fleet.sh` and the rust overview renderer add a `dead_panes` count; alert when any pane has `dead==1` for >60s.
- **F2 — cap-probe cache TTL**: drop cache file age threshold from current default to 60s; invalidate on any prior bringup failure marker.
- **F3 — auto-wake on bringup**: new `CODEX_FLEET_AUTO_WAKE` env (default `1`) that fires `wake-prompt.sh` once at the end of `full-bringup.sh`, before the `DONE.` banner. Existing wake-prompt window continues handling subsequent ticks.
- **F4 — plan-watcher inherits --allow-waves**: pass `--allow-waves` to `lib/plan-validator.sh` from `plan-watcher.sh:run_plan_validator()`. Optional env `CODEX_FLEET_PLAN_VALIDATOR_FLAGS` for operator override.
- **F5 — worker-ready signal + retry**: `force-claim.sh` checks each worker pane's mode (via `tmux display-message -p -t <pane> '#{pane_in_mode}'` plus a Codex-specific input-state heuristic) before send-keys; if not ready, backoff and retry on next tick rather than emit "not in a mode".
- **F6 — Codex auto-submit**: investigate whether send-keys requires a different terminator (e.g., `Enter Enter`, or sending text via `paste-buffer` + paste vs. raw send-keys). Add a smoke test in `scripts/codex-fleet/test/` that scripts a 1-pane fleet through claim → execute → status on a no-op plan, asserting the worker actually starts.
- **F7 — Codex first-launch prompt auto-bypass**: ship `scripts/codex-fleet/codex-first-launch-supervisor.sh` (already seeded in this branch) that polls each worker pane and auto-answers the three first-launch prompts (`Do you trust …` → Enter; `External agent config detected` → key `1`; `Press enter to continue` → Enter). Wire it into `full-bringup.sh` as the second-to-last step (before F3's auto-wake, after the chrome verify), gated by `CODEX_FLEET_AUTO_BYPASS=1` default-on. Idempotent; safe to re-run.

## Impact

- **Risk**: medium. Changes touch the dispatch hot path; a regression could prevent dispatch globally. Each subtask is bounded to a single script with disjoint file_scope, so they can roll back independently.
- **Surfaces affected**: `scripts/codex-fleet/show-fleet.sh`, `scripts/codex-fleet/cap-probe.sh`, `scripts/codex-fleet/full-bringup.sh`, `scripts/codex-fleet/plan-watcher.sh`, `scripts/codex-fleet/force-claim.sh`, `scripts/codex-fleet/test/` (new smoke test), **`scripts/codex-fleet/codex-first-launch-supervisor.sh`** (new). No Colony / recodee changes.
- **Rollout**: features F1-F4 are observability/inheritance fixes — ship default-on. F5 (ready signal) and F6 (auto-submit) gate behind env `CODEX_FLEET_DISPATCH_V2=1` for one cycle of operator testing before flipping default.
- **Telemetry**: each subtask must also append one example JSONL entry to `docs/fleet-telemetry-cases.md` so future regressions are catchable.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
## ADDED Requirements

### Requirement: cfui-dispatch-improvements-zzz-2026-05-18 behavior
The system SHALL enforce cfui-dispatch-improvements-zzz-2026-05-18 behavior as defined by this change.

#### Scenario: Baseline acceptance
- **WHEN** cfui-dispatch-improvements-zzz-2026-05-18 behavior is exercised
- **THEN** the expected outcome is produced
- **AND** regressions are covered by tests.
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
## Definition of Done

This change is complete only when **all** of the following are true:

- Every checkbox below is checked.
- The agent branch reaches `MERGED` state on `origin` and the PR URL + state are recorded in the completion handoff.
- If any step blocks (test failure, conflict, ambiguous result), append a `BLOCKED:` line under section 4 explaining the blocker and **STOP**. Do not tick remaining cleanup boxes; do not silently skip the cleanup pipeline.

## Handoff

- Handoff: change=`agent-claude-cfui-dispatch-improvements-zzz-2026-05-1-2026-05-18-14-03`; branch=`agent/<your-name>/<branch-slug>`; scope=`TODO`; action=`continue this sandbox or finish cleanup after a usage-limit/manual takeover`.
- Copy prompt: Continue `agent-claude-cfui-dispatch-improvements-zzz-2026-05-1-2026-05-18-14-03` on branch `agent/<your-name>/<branch-slug>`. Work inside the existing sandbox, review `openspec/changes/agent-claude-cfui-dispatch-improvements-zzz-2026-05-1-2026-05-18-14-03/tasks.md`, continue from the current state instead of creating a new sandbox, and when the work is done run `gx branch finish --branch agent/<your-name>/<branch-slug> --base dev --via-pr --wait-for-merge --cleanup`.

## 1. Specification

- [x] 1.1 Proposal scope and acceptance criteria captured in `proposal.md` (6 findings F1–F6 with reproduction evidence from the 2026-05-18 marketing-content-waves fleet run).
- [ ] 1.2 Define normative requirements in `specs/cfui-dispatch-improvements-zzz-2026-05-18/spec.md` (one per finding, with response-shape / state-machine contract).

## 2. Implementation

Owned by 6 fleet subtasks in `openspec/plans/fleet-dispatch-fixes-2026-05-18/plan.json`. Disjoint file_scope, parallel-ready.

- [ ] 2.1 **F1 — Dead pane surfacing**: `show-fleet.sh` + rust overview emit `dead_panes` count; alert at age >60s.
- [ ] 2.2 **F2 — Cap-probe cache TTL**: 60s default; invalidate on bringup-failure marker.
- [ ] 2.3 **F3 — Auto-wake on bringup**: `CODEX_FLEET_AUTO_WAKE=1` default; fires `wake-prompt.sh` once before `DONE.`
- [ ] 2.4 **F4 — plan-watcher inherits --allow-waves**: pass flag from `run_plan_validator()`; env override.
- [ ] 2.5 **F5 — Worker-ready signal + retry**: `force-claim.sh` reads pane input-mode before send-keys; backoff on not-ready.
- [ ] 2.6 **F6 — Codex auto-submit smoke test + fix**: script a 1-pane fleet through claim→execute→status; assert worker starts.
- [x] 2.7 **F7 — Codex first-launch prompt auto-bypass**: `scripts/codex-fleet/codex-first-launch-supervisor.sh` seeded in this branch; wire into `full-bringup.sh` as a fleet subtask (sub-6 in `openspec/plans/fleet-dispatch-fixes-2026-05-18/plan.json`).

## 3. Verification

- [ ] 3.1 Each subtask ships a focused test under `scripts/codex-fleet/test/<finding>-test.sh` that reproduces the original symptom and asserts the fix.
- [ ] 3.2 Run `openspec validate agent-claude-cfui-dispatch-improvements-zzz-2026-05-1-2026-05-18-14-03 --type change --strict`.
- [ ] 3.3 Run `openspec validate --specs`.
- [ ] 3.4 Integration: run a fresh `full-bringup.sh --plan-slug fleet-dispatch-fixes-2026-05-18 --n 4 --auto-fleet-id --no-cap-cache` against this very change's plan workspace and assert at least 4 Colony task claims land within 90 seconds of `DONE.` (vs the current 0).
- [ ] 3.5 Capture `/tmp/codex-fleet-telemetry-dispatch-fixes.jsonl` and attach the last 30 lines to the integration PR.

## 4. Cleanup (mandatory; run before claiming completion)

- [ ] 4.1 Run the cleanup pipeline: `gx branch finish --branch agent/<your-name>/<branch-slug> --base dev --via-pr --wait-for-merge --cleanup`. This handles commit -> push -> PR create -> merge wait -> worktree prune in one invocation.
- [ ] 4.2 Record the PR URL and final merge state (`MERGED`) in the completion handoff.
- [ ] 4.3 Confirm the sandbox worktree is gone (`git worktree list` no longer shows the agent path; `git branch -a` shows no surviving local/remote refs for the branch).
13 changes: 13 additions & 0 deletions openspec/plans/fleet-dispatch-fixes-2026-05-18/architect.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Architect

Plan: `fleet-dispatch-fixes-2026-05-18`

## Responsibility

Check that each subtask touches only its own file_scope. Verify F5 and F6 land behind CODEX_FLEET_DISPATCH_V2=1 in code (gated rollout).

## Checkpoints

- [ ] Read `plan.md`, `tasks.md`, and `checkpoints.md`.
- [ ] Record decisions or blockers in the plan workspace before handoff.
- [ ] Keep task-thread status aligned with local files.
18 changes: 18 additions & 0 deletions openspec/plans/fleet-dispatch-fixes-2026-05-18/checkpoints.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Checkpoints

## Rollup

- available: 7
- claimed: 0
- completed: 0
- blocked: 0

## Subtasks

- [ ] sub-0 F1 — Surface dead panes in show-fleet.sh + rust overview [available]
- [ ] sub-1 F2 — Cap-probe cache TTL hardening [available]
- [ ] sub-2 F3 — Auto-wake workers at end of full-bringup [available]
- [ ] sub-3 F4 — plan-watcher inherits --allow-waves [available]
- [ ] sub-4 F5 — Worker-ready signal + retry in force-claim [available]
- [ ] sub-5 F6 — Codex auto-submit smoke test + fix [available]
- [ ] sub-6 F7 — Wire codex-first-launch-supervisor.sh into full-bringup.sh [available]
13 changes: 13 additions & 0 deletions openspec/plans/fleet-dispatch-fixes-2026-05-18/critic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Critic

Plan: `fleet-dispatch-fixes-2026-05-18`

## Responsibility

Adversarial review: does F4 break plans WITHOUT depends_on? Does F2 cause re-probing that flaps account health? Does F3 double-wake when wake-prompt window also fires?

## Checkpoints

- [ ] Read `plan.md`, `tasks.md`, and `checkpoints.md`.
- [ ] Record decisions or blockers in the plan workspace before handoff.
- [ ] Keep task-thread status aligned with local files.
13 changes: 13 additions & 0 deletions openspec/plans/fleet-dispatch-fixes-2026-05-18/executor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Executor

Plan: `fleet-dispatch-fixes-2026-05-18`

## Responsibility

Implement claimed subtasks inside declared file_scope. Each fix ships with at least one assertion in scripts/codex-fleet/test/ that would have caught the original bug.

## Checkpoints

- [ ] Read `plan.md`, `tasks.md`, and `checkpoints.md`.
- [ ] Record decisions or blockers in the plan workspace before handoff.
- [ ] Keep task-thread status aligned with local files.
114 changes: 114 additions & 0 deletions openspec/plans/fleet-dispatch-fixes-2026-05-18/plan.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
{
"schema_version": 1,
"plan_slug": "fleet-dispatch-fixes-2026-05-18",
"title": "Fix codex-fleetui dispatch path: dead panes, cap cache, auto-wake, plan-watcher waves, ready signal, Codex auto-submit",
"problem": "Real-fleet telemetry from the 2026-05-18 marketing-content-waves bringup (recodee repo) surfaced six concrete dispatch-path defects that block workers from claiming Colony tasks even when the fleet is physically up. F1: stale dead panes linger silently. F2: cap-probe cache outlived quota recovery. F3: wake-prompt window stays blank \u2014 workers idle at default Codex placeholders. F4: plan-watcher re-validates without --allow-waves and silently falls back to other plans when our priority plan has depends_on. F5: force-claim send-keys hits 'not in a mode' on non-idle panes and drops dispatch with no retry. F6: even when send-keys lands, Codex's auto-submit doesn't fire \u2014 context drops but Colony never sees a claim. Net: a healthy-looking tmux fleet performs zero work.",
"acceptance_criteria": [
"show-fleet.sh and rust overview surface dead_panes count; alert fires when any pane has dead==1 for >60s",
"cap-probe cache invalidates after 60s default; invalidates immediately on bringup-failure marker",
"CODEX_FLEET_AUTO_WAKE=1 default fires wake-prompt.sh once at end of full-bringup.sh, before DONE banner; can be disabled with =0",
"plan-watcher.sh inherits --allow-waves when invoking lib/plan-validator.sh; operator override via CODEX_FLEET_PLAN_VALIDATOR_FLAGS env",
"force-claim.sh skips panes that fail input-ready check; retries on next tick with backoff instead of emitting 'not in a mode' silently",
"Smoke test scripts/codex-fleet/test/codex-auto-submit-test.sh asserts a 1-pane fleet on a no-op plan reaches at least one Colony claim within 90s; current behavior would fail (zero claims)",
"Integration test: full-bringup.sh --plan-slug fleet-dispatch-fixes-2026-05-18 --n 4 --auto-fleet-id --no-cap-cache against THIS plan results in >=4 Colony claims within 90s of DONE banner",
"openspec validate agent-claude-cfui-dispatch-improvements-zzz-2026-05-1-2026-05-18-14-03 --type change --strict passes; openspec validate --specs passes",
"Each subtask appends one example JSONL entry to docs/fleet-telemetry-cases.md documenting the original failure mode + the new test assertion",
"scripts/codex-fleet/codex-first-launch-supervisor.sh fires once at the tail of full-bringup.sh (after iOS chrome verify, before DONE banner), gated by CODEX_FLEET_AUTO_BYPASS env (default 1); a fresh fleet bringup shows zero panes stuck on 'Do you trust' within 30s of DONE"
],
"roles": [
"planner",
"architect",
"critic",
"executor",
"writer",
"verifier"
],
"tasks": [
{
"subtask_index": 0,
"title": "F1 \u2014 Surface dead panes in show-fleet.sh + rust overview",
"description": "Read tmux #{pane_dead} via list-panes -F. Emit dead_panes count in scripts/codex-fleet/show-fleet.sh JSON output. Add alert when any pane has dead==1 for >60s (read pane_dead_status_changed timestamp from /tmp/claude-viz/fleet-state.json if present, else first-detection time). Add a one-line example to docs/fleet-telemetry-cases.md.",
"file_scope": [
"scripts/codex-fleet/show-fleet.sh",
"docs/fleet-telemetry-cases.md"
],
"depends_on": [],
"spec_row_id": null,
"capability_hint": "doc_work",
"status": "available"
},
{
"subtask_index": 1,
"title": "F2 \u2014 Cap-probe cache TTL hardening",
"description": "Lower cap-probe cache TTL default to 60 seconds (current is much higher \u2014 first run found 5/6, fresh probe 5min later found 8/8). Invalidate cache when /tmp/claude-viz/bringup-failure.marker exists. Add a CODEX_FLEET_CAP_CACHE_TTL env override. Touch ONLY cap-probe.sh; document the new env in scripts/codex-fleet/README.md or equivalent.",
"file_scope": [
"scripts/codex-fleet/cap-probe.sh",
"scripts/codex-fleet/cap-probe-cache.sh"
],
"depends_on": [],
"spec_row_id": null,
"capability_hint": "test_work",
"status": "available"
},
{
"subtask_index": 2,
"title": "F3+F7 wire-in \u2014 auto-wake + auto-bypass at tail of full-bringup",
"description": "Wire two end-of-bringup steps into scripts/codex-fleet/full-bringup.sh, both AFTER 'iOS chrome verified' and BEFORE 'DONE.' banner: (F7) call scripts/codex-fleet/codex-first-launch-supervisor.sh (already seeded in this branch) to drain Codex first-launch prompts, gated by CODEX_FLEET_AUTO_BYPASS=1 default; (F3) call scripts/codex-fleet/wake-prompt.sh once to wake workers, gated by CODEX_FLEET_AUTO_WAKE=1 default. Auto-bypass must run BEFORE auto-wake (workers need to be at Codex idle prompt before wake-prompt fires). Both gates default-on; operator opts out via env=0. Touch ONLY full-bringup.sh.",
"file_scope": [
"scripts/codex-fleet/full-bringup.sh"
],
"depends_on": [],
"spec_row_id": null,
"capability_hint": "api_work",
"status": "available"
},
{
"subtask_index": 3,
"title": "F4 \u2014 plan-watcher inherits --allow-waves",
"description": "In scripts/codex-fleet/plan-watcher.sh:run_plan_validator(), pass --allow-waves to the validator invocation (around line 187-189 where we see summary=`\"$validator\" \"$plan_json\" 2>/dev/null`). Add CODEX_FLEET_PLAN_VALIDATOR_FLAGS env override for operators who need to inject other flags. Touch ONLY plan-watcher.sh.",
"file_scope": [
"scripts/codex-fleet/plan-watcher.sh"
],
"depends_on": [],
"spec_row_id": null,
"capability_hint": "frontend_work",
"status": "available"
},
{
"subtask_index": 4,
"title": "F5 \u2014 Worker-ready signal + retry in force-claim",
"description": "Before send-keys, force-claim.sh checks pane input-mode via `tmux display-message -p -t <pane> '#{pane_in_mode}'` AND a Codex-input-state heuristic (capture last line, look for `\u203a` prompt marker). If not ready, log 'pane <id> not-ready; deferring' and skip \u2014 DO NOT emit 'not in a mode' nor consume the Colony claim. The deferred subtask returns to ready state on the next tick. Touch ONLY force-claim.sh.",
"file_scope": [
"scripts/codex-fleet/force-claim.sh"
],
"depends_on": [],
"spec_row_id": null,
"capability_hint": "frontend_work",
"status": "available"
},
{
"subtask_index": 5,
"title": "F6 \u2014 Codex auto-submit smoke test + fix",
"description": "Write scripts/codex-fleet/test/codex-auto-submit-test.sh: spin up a 1-pane fleet against a no-op plan, send-keys a wake prompt, assert Colony shows >=1 claim within 90s. The current dispatch path will fail this test (zero claims). Then fix: experiment with `tmux send-keys ... Enter Enter`, `paste-buffer` + `paste-buffer -p`, or `Tab Enter` until the smoke test passes. Update force-claim.sh OR worker-prompt.md (whichever owns the submit step) with the working pattern. Document in docs/fleet-telemetry-cases.md.",
"file_scope": [
"scripts/codex-fleet/test/codex-auto-submit-test.sh"
],
"depends_on": [],
"spec_row_id": null,
"capability_hint": "test_work",
"status": "available"
},
{
"subtask_index": 6,
"title": "F7-test \u2014 Smoke test that no panes stay stuck on first-launch prompts",
"description": "Write scripts/codex-fleet/test/first-launch-bypass-test.sh that boots a 1-pane fleet against a no-op plan and asserts: within 30s of DONE banner, the worker pane shows zero matches for 'Do you trust', 'External agent config detected', or 'Press enter to continue'. Skips if CODEX_FLEET_AUTO_BYPASS=0 (operator opt-out). This test will fail today (proving the bug) and pass after F3+F7 wire-in lands.",
"file_scope": [
"scripts/codex-fleet/test/first-launch-bypass-test.sh"
],
"depends_on": [],
"spec_row_id": null,
"capability_hint": "test_work",
"status": "available"
}
]
}
Loading