Skip to content

feat(coda): three-mode framework — narrow coda_run to replay-only + add coda_interactive#67

Open
datasciencemonkey wants to merge 93 commits into
mainfrom
feat/coda-mcp-interactive-handoff
Open

feat(coda): three-mode framework — narrow coda_run to replay-only + add coda_interactive#67
datasciencemonkey wants to merge 93 commits into
mainfrom
feat/coda-mcp-interactive-handoff

Conversation

@datasciencemonkey
Copy link
Copy Markdown
Collaborator

@datasciencemonkey datasciencemonkey commented May 28, 2026

Summary

Two changes that together establish the three-mode framework for the CoDA MCP server:

  1. Narrow coda_run to replay-only URLs — its returned viewer_url now always serves a static transcript from disk, never a live PTY attach. Drops the 5-minute grace machinery introduced in feat: CoDA MCP live session URL — watch hermes execute live + replay #66 (which was actually never wired in production — see commit 193c9a3 for the dead-code analysis).
  2. Add new coda_interactive MCP tool — for human handoff from upstream MCP clients (Genie Code, Claude Desktop, Cursor). Caller passes a Databricks Workspace Git Folder path, optional branch, and a kickoff prompt. Coda exports the file tree, launches the chosen agent (claude default; also hermes/codex/gemini/opencode), auto-types the prompt, and returns a viewer_url for the human to attach to.

The three-mode framework now:

Mode Tool URL semantics PTY lifecycle
1. Direct launch Web UI tab n/a (no external URL) 24h idle / WS-heartbeat extends
2. coda_interactive (new) MCP tool Live attach 24h idle / WS extends
3. coda_run (narrowed) MCP tool Replay only Immediate teardown on hermes exit

Why Workspace Git Folders (not GitHub clone)

coda_interactive materializes project files via the Databricks Workspace API — uses Coda's existing DATABRICKS_TOKEN, no new credentials needed. Trade-off: git history is unavailable inside the session (files-only export). If history matters for a session, the MCP caller can include a git log summary in the prompt string.

Pre-existing security fix bundled

mcp_create_pty_session was stripping only 5 env vars; the HTTP create_session path was stripping NPM_TOKEN, UV_*, and npm_config_//* patterns too. Refactored to share _build_terminal_shell_env (commit ef15ef7). Closes a latent registry-credential leak into MCP-created PTYs.

Subsumes PR #66

PR #66 introduced live-attach + 5-min grace on coda_run. This PR keeps the viewer URL feature but narrows it to replay-only (since live attach moves to coda_interactive). Close #66 if this lands.

Spec + Plan artifacts (in the diff)

  • Todo 1 (Mode 3 narrowing): docs/superpowers/specs/2026-05-28-coda-run-replay-only-design.md + plans/2026-05-28-coda-run-replay-only.md
  • Todo 2 (Mode 2 addition): docs/superpowers/specs/2026-05-28-coda-interactive-mcp-tool-design.md + plans/2026-05-28-coda-interactive-mcp-tool.md
  • Both passed independent critic-agent reviews at spec, plan, and per-task stages.

Test plan

  • Local: 551 passed, 21 skipped — confirmed (pytest tests/ --ignore=tests/e2e)
  • PTY-gated tests skip cleanly on Mac dev environment; should pass on Linux CI/deployed app
  • Manual smoke against deployed CoDA: invoke coda_run from Genie Code; confirm viewer URL is replay-only
  • Manual smoke against deployed CoDA: invoke coda_interactive with a Workspace Git Folder; confirm agent launches in exported project dir with prompt typed
  • Confirm coda_inbox does NOT show interactive sessions; coda_get_result returns nothing for them
  • Verify _build_terminal_shell_env strip on deployed PTY — env | grep -E 'NPM_TOKEN|UV_' should be empty

Follow-up: broadened source contract (commits 326e19a..a555602)

coda_interactive no longer requires the workspace_path to be a Databricks Workspace Git Folder. Any Workspace directory (Git Folder or plain Workspace folder) is accepted. The branch parameter has been removed — callers manage Git Folder branch state themselves before calling.

API change (no shipped consumers — safe):

  • coda_interactive(prompt, workspace_path, branch=..., agent=..., email=...)coda_interactive(prompt, workspace_path, agent=..., email=...)
  • Return shape: "branch" key dropped.

Validation: replaced repos.list + exact-match filter + optional repos.update with a single workspace.get_status call + directory-type check (_is_directory from workspace_export.py). Clean errors for "path not found" and "path is not a directory" — both return before any PTY allocation.

Server-level instructions string rewritten to:

  • Tell callers that plain Workspace folders work.
  • Surface the upload-then-handoff pattern explicitly (workspace.import first if files aren't in the Workspace yet) so an upstream LLM knows the tool doesn't accept inline file payloads.

Quality fix bundled: hoisted the _app_send_input is None guard above PTY creation so an unwired send-hook can no longer orphan a PTY + project dir.

Test delta: −3 / +4. All 13 tests in tests/test_coda_interactive.py pass.

Artifacts:

  • Spec: docs/superpowers/specs/2026-05-28-coda-interactive-broaden-source-design.md
  • Plan: docs/superpowers/plans/2026-05-28-coda-interactive-broaden-source.md
  • Original spec marked as Amended by: for traceability.

Follow-up #2: Workflow protocol + Databricks orientation (commits 6ff6a9b..77321dc)

coda_run now injects two new sections into prompt.txt:

  • CAPABILITIES — tells hermes about the Databricks CLI (pre-authed), the 16 Databricks skills under ~/.claude/skills/, and the DeepWiki / Exa / CoDA MCP servers.
  • WORKFLOW PROTOCOL — imposes a 3-phase pipeline (PLAN → EXECUTE → SYNTHESIZE) with a critique step after each phase (self-review or sub-agent — agent's choice). Max 2 iterations per phase to keep token cost bounded.

New terminal result.json status "info_needed" with a required feedback field gives the calling client a structured iteration loop when the agent is blocked. The existing "needs_approval" status is preserved with explicit disambiguation in the protocol: info_needed = "caller must add context"; needs_approval = "caller must approve a destructive action".

Three upstream-facing surfaces updated so calling LLMs know about the new statuses:

  • coda_inbox counts dict gains info_needed and needs_approval keys.
  • coda_get_result docstring lists all four valid statuses + the new feedback field.
  • FastMCP server-level instructions gain an INFO_NEEDED HANDOFF paragraph teaching upstream LLMs to read feedback and resubmit with previous_session_id.

Flag: coda_run(..., workflow_protocol=True) is the default. Set False to skip both new sections for non-Databricks tasks.

Artifacts:

  • Spec: docs/superpowers/specs/2026-05-28-coda-run-workflow-protocol-design.md
  • Plan: docs/superpowers/plans/2026-05-28-coda-run-workflow-protocol.md

Discipline gates run:

  • Spec critic → APPROVE-WITH-FIXES → 5 fixes applied (counts dict, needs_approval disambig, MCP instructions, canonical skill test, token estimate)
  • Plan critic → APPROVE-WITH-FIXES → 2 fixes applied (correct _write_json stub, expanded regression coverage)
  • Per task (1-4): spec + code-quality reviews, all approvals; minor fixes applied (section_noise dead code, JSON union-syntax placeholder)

Follow-up #3: coda_interactive pulls files in the terminal (fixes empty-session bug)

Bug: Calling coda_interactive launched the agent over an empty directory — the agent had no idea about the user's Workspace files.

Root cause: The MCP server's WorkspaceClient() resolves to the app's service principal (app-167dcd …), which can get_status the user's /Users/<user>/… folder (so the tool reported "launched") but cannot list/export its contents. workspace_export.py swallowed those errors → empty dir + misleading success. Verified: the CoDA terminal's CLI runs as the user (databricks current-user me → the user), and databricks workspace list/export of the folder works as the user via REST.

Fix: Stop exporting server-side. coda_interactive now types cd <project_dir> && databricks workspace export-dir <source> ./<name> && cd <name> into the PTY (authenticated as the user), waits for the pull to settle, then does a server-side filesystem post-check (identity-independent — it stats the local disk the terminal wrote). If files landed it launches the agent and seeds the prompt with a context line naming the source; if nothing landed it returns a real status=error and never launches.

Design highlights:

  • Split waits — wait for the pull to go idle (reliable stabilization), THEN the existing agent-ready wait after launch. Avoids pasting the prompt into a half-initialized agent.
  • databricks workspace export-dir natively handles notebook extensions, replacing the hand-rolled workspace_export.py (deleted, with its tests).
  • _wait_for_agent_ready is now a thin wrapper over a generalized _wait_for_output_stable(pty, max_wait, stability); coda_run is unchanged.
  • New helpers _safe_dirname (sanitized basename, rejects ./..) and _normalize_workspace_path (drops the /Workspace FUSE prefix).
  • MCP instructions wording updated; no more "server-side snapshot".

Tests: test_coda_interactive.py rewritten to the pull contract (pull-first, FS-check failure → no launch, prompt context line, agent matrix); _safe_dirname/_normalize_workspace_path + the wait-wrapper covered in test_mcp_server.py. Suite: 117 passed (only the documented PTY-fd flake fails in multi-file runs; passes in isolation).

Artifacts:

  • Spec: docs/superpowers/specs/2026-05-28-coda-interactive-terminal-pull-design.md
  • Plan: docs/superpowers/plans/2026-05-28-coda-interactive-terminal-pull.md

Discipline gates: brainstorm → design critic (SOUND-WITH-FIXES, folded in) → spec → plan → plan critic (SOUND-WITH-FIXES, folded in) → TDD implementation → final critic (SHIP-WITH-FOLLOWUPS; the one Important finding — .. path-traversal in _safe_dirname — fixed before push).

This branch also merges the latest main (Dependabot deps bump #68: urllib3 2.7.0 / gitpython 3.1.50 / idna 3.16).

Mounts an MCP server at `/mcp` so Databricks Genie Code (and other MCP
clients like Claude Desktop, Cursor) can delegate coding tasks to the
existing Hermes Agent infrastructure. Exposes three high-level tools
following the v2 background-execution pattern:

  - coda_run        — submit a coding task, returns task_id immediately
  - coda_inbox      — poll all task statuses (24h window)
  - coda_get_result — fetch structured output of a completed task

Plus internal helpers (`coda_create_session`, `coda_get_status`,
`coda_close_session`). Sessions and task state are persisted to disk
under `~/.coda/sessions/` so tasks survive worker restarts.

Architecture
------------
- Native MCP SDK transport (`FastMCP.streamable_http_app()`) — required
  by Genie Code's Custom MCP server picker (custom JSON-RPC handlers
  don't work).
- `stateless_http=True`, `json_response=True`. DNS-rebinding protection
  disabled (proxy handles auth, workspace origin allowed via CORS
  middleware).
- Switches the production entrypoint from gunicorn → uvicorn so we can
  serve both the MCP ASGI app and the existing Flask UI side-by-side
  (Flask mounted via WSGIMiddleware). WebSocket falls back to HTTP
  polling under uvicorn — acceptable per the design doc; the Web Worker
  poller is already in place.
- Skips CSP/security headers on the `/mcp` path (CSP interfered with
  Genie Code's transport).
- Hermes is always the agent invoked; it routes to sub-agents
  internally.
- Adds a stdio MCP bridge (`tools/coda-bridge.py`) for Claude Code's
  OAuth-based auth flow.

Repository reshuffles
---------------------
- New `coda_mcp/` package: `mcp_server`, `mcp_endpoint`, `mcp_asgi`,
  `task_manager`.
- `setup_*.py` moved from repo root to `setup/`.
- `install_*.sh` moved from repo root to `scripts/`.
- Tests: new coverage for the MCP server, integration flow, task
  manager, content filter proxy, sync_to_workspace, _run_step.
- Docs: `docs/mcp-client-setup.md`, `docs/mcp-v2-background-execution.md`,
  and the full implementation plan at
  `docs/plans/2026-05-01-coda-mcp-server.md`.

Safety guardrails
-----------------
The CODA-TASK prompt envelope explicitly forbids destructive operations
(DROP/DELETE/TRUNCATE, CLI deletes, permission changes) at the prompt
level, in line with the CoDA Constitution.

Tested as `mcp-test-coda` on workspace `fevm-serverless-9cefok`
(profile `9cefok`). App name must start with `mcp-` to appear in the
Genie Code Custom MCP server picker.

Provenance
----------
Squashed from 40 commits originally on
`datasciencemonkey/coding-agents-databricks-apps#156`, last working
tip `1ce86bf`. Full commit-by-commit history preserved locally on the
tag `coda-mcp-backup-2026-05-25`.

Conflict resolutions during the squash:
  - README.md MLflow section: kept main's Claude+Codex unified switch
    (newer than coda-mcp's Claude-only state).
  - setup/setup_claude.py: combined main's enterprise installer URL
    handling with coda-mcp's `SKIP_CLAUDE_INSTALL` test escape hatch.
Surfaces a doc audit pass against the squash:

- README: replace the gunicorn+Flask architecture diagram with the
  actual uvicorn ASGI stack (socketio.ASGIApp → /mcp + WSGI(Flask)).
  Update the startup-flow narrative, the "Server" config section
  (was "Gunicorn"), the project-structure annotations for app.yaml
  and gunicorn.conf.py (legacy, retained for WSGI-only dev), and the
  Technologies list.
- app.yaml: prepend a comment block explaining why the entrypoint is
  uvicorn (FastMCP.streamable_http_app is native ASGI; gunicorn WSGI
  cannot serve it). Notes the polling-fallback behaviour and the
  retained-but-unused gunicorn.conf.py.
- docs/plans/2026-05-01-coda-mcp-server.md: prepend a SUPERSEDED
  banner. The shipped implementation is the v2 design in
  docs/mcp-v2-background-execution.md (3 tools on uvicorn+ASGI), not
  the 5-tool gunicorn+WSGI plan in this file. Kept for design-
  evolution archaeology.
- coda_mcp/mcp_endpoint.py: docstring now clearly states this module
  is a Flask Blueprint fallback for WSGI runtimes (gunicorn local dev,
  Flask test client). Production routes through coda_mcp.mcp_asgi.
Closes two coverage gaps surfaced by a pre-merge test audit. Both
files exercise surfaces that production traffic actually hits, and
neither had a dedicated test file before this commit.

tests/test_mcp_endpoint.py (9 tests, all pass)
- Pin the Flask Blueprint's JSON-RPC contract: initialize, tools/list,
  ping, tools/call (unknown), unknown method, CORS preflight,
  jsonrpc id echo, non-JSON body resilience, tool schema presence.
- Asserts the tool surface is exactly {coda_run, coda_inbox,
  coda_get_result}. Drift from the v2 contract fails loudly.

tests/test_coda_bridge.py (3 pass + 1 documented skip)
- Verify the bridge injects the Databricks Bearer token mounted via
  `databricks auth token` into Authorization on every forwarded
  request (regression guard — a silent drop would 401 every Genie
  Code call against a deployed app).
- Verify it surfaces server response bodies and refuses to run
  without CODA_MCP_URL configured.
- Skip and document the stdout-capture variant for a follow-up.

Full suite: 490 passed, 2 skipped (was 478/1 before this PR). No
regressions.
…nt /socket.io middleware gap, split _doReplay from _doAttach
…arded-Host

Adds AppUrlCaptureMiddleware to mcp_asgi.py that captures X-Forwarded-Host
(falling back to Host) from every inbound HTTP request and populates
url_builder._app_url_cache. Also hardens capture_from_headers to strip
accidental https:// / http:// scheme prefixes before caching, preventing
double-scheme URLs in build_viewer_url output.
…ering

Adds _initFromQueryString() boot-time URL parse, _doReplay() for static
transcript rendering in 64KB RAF-yielded chunks, _renderExpiredPage() for
404 fallback, and history.replaceState hygiene on pane/tab close.
Appends test_end_to_end_grace_and_replay to test_mcp_integration.py.
Exercises the full coda_run flow with real Flask PTY hooks: create PTY,
send input, write result.json, trigger _schedule_deferred_close, verify
grace state and deferred PTY teardown, confirm transcript persists, and
validate find_task_dir_by_pty_session resolves correctly. Guarded by
_pty_skip so headless CI without PTY allocators skips cleanly.
Initialize pty_id, sess_id, and task_id to None before the try/finally in
test_end_to_end_grace_and_replay so that an early exception (e.g.,
coda_run or _read_session raising) doesn't trigger UnboundLocalError on
"if pty_id in sessions", which would mask the original exception. The
finally now guards with "if pty_id and pty_id in sessions".
The watcher thread spawned by coda_run polls for result.json every 5s
and, when it finds one, calls complete_task + _schedule_deferred_close
itself. The E2E test does that orchestration manually so it can assert
on intermediate state. With both drivers active, the watcher races
the test body and produces SessionNotFoundError plus flaky assertion
failures.

Monkeypatch coda_mcp.mcp_server._watch_task to a no-op for this
specific test so the manual orchestration is the sole driver.
Splits coda_run (Mode 3: replay-only) from forthcoming coda_interactive
(Mode 2: live attach). Rips out the 5-min grace machinery — never wired
in prod and obsolete under the three-mode framework. Critique gate cleared.
11 TDD tasks + pre-flight + final verification. Plan critique gate
cleared — three issues caught and fixed (locking deadlock in Task 8,
TDD violation in Task 5, missing PTY-skip guard in Task 10).
Backward-compatible default (False). Stored in session dict for later
attach-time enforcement.
Pure refactor — no behavior change. Helper is also used by the new
replay_only short-circuit in the next commit.
…session

Replay-only sessions always serve the on-disk transcript regardless of
whether the PTY is still alive. Used by coda_run (wired in the next commit).
Mode 3 in the three-mode framework. The viewer_url returned by coda_run
now always resolves to a transcript-from-disk replay.
Bind pty_id from session.get() before entering try block so the finally
cleanup runs even if the session row is malformed.
… close

Pure call-site swap. Behavior change: PTY teardown is immediate rather
than 5-minute-deferred. _schedule_deferred_close becomes dead code,
ripped out in a follow-up commit.
Replaces the Repos API lookup (repos.list + repos.update) with a single
workspace.get_status check. Caller is now responsible for managing
Git Folder branch state. Workspace path can be a Git Folder or a plain
Workspace folder — either works.
Both _app_create_session and _app_send_input are set together by
set_app_hooks(); validating them together before PTY creation closes a
resource-leak path where the PTY was created but the send-hook check
fired and returned, orphaning the PTY and project_dir.

Also: parenthesize the path in 'Workspace path not found' error for
readability.
Tells upstream LLM callers that workspace_path can be either a Git Folder
or a plain Workspace folder, and surfaces the upload-then-handoff pattern
(workspace.import before calling). Drops the 'commit and push' admonition
that only applied to Git Folders. New pinned test guards the contract.
…, counts dict, MCP instructions, canonical skill test, token estimate
…n), add test_mcp_server + test_replay_only_flag regression coverage
…OL builders

Two pure-function builders for the new prompt envelope sections plus the
canonical Databricks skill list. Tests pin the skill list against CLAUDE.md
to catch drift in either direction, and pin both sections to token budgets.
…ap_prompt

The flag defaults to True. When set, wrap_prompt inserts CAPABILITIES and
WORKFLOW PROTOCOL sections between TASK and INSTRUCTIONS in prompt.txt.
Callers can opt out via workflow_protocol=False on coda_run for purely
non-Databricks tasks.
…CTIONS

The INSTRUCTIONS section of prompt.txt now enumerates the four allowed
result.json status values (completed, failed, info_needed, needs_approval),
describes when to use each, and lists the canonical status.jsonl step
labels emitted by the workflow protocol.
…t doc, MCP instructions

Three surfaces updated so calling LLMs and dashboards know about the
two soft terminal statuses:
- coda_inbox counts dict gains info_needed and needs_approval keys.
- coda_get_result docstring lists all four valid statuses and the
  feedback field that accompanies info_needed.
- FastMCP server-level instructions gain an INFO_NEEDED HANDOFF
  paragraph teaching upstream LLMs to read 'feedback' and resubmit
  with previous_session_id for the chained context.

Existing test_mcp_server.py inbox-counts assertions updated to match
the new 5-key shape.
…s server-side export)

Root cause: deployed app runs as its own service principal which can't
list/export the user's Workspace folder; workspace_export.py swallows the
error -> empty dir. Fix: pull files in the terminal (authed as the user)
via 'databricks workspace export-dir', with split waits + a server-side
filesystem post-check. Folds in all design-critic SOUND-WITH-FIXES items.
…elper

_wait_for_output_stable(pty, max_wait, stability) is the parametrized poller;
_wait_for_agent_ready becomes a thin wrapper preserving the 5.0/1.0 budget so
coda_run is unaffected. Adds _EXPORT_MAX_WAIT_S/_EXPORT_STABILITY_S for the
terminal-side pull wait.
…ver-side

Root cause of the empty-session bug: the MCP server's WorkspaceClient runs as
the app service principal, which can't list/export the user's Workspace folder,
and workspace_export.py swallowed the error -> empty dir + a misleading
'launched'. Now the tool types 'databricks workspace export-dir' into the PTY
(authed as the user), waits for the pull to settle, verifies files landed on
disk (server-side FS check, identity-independent), then launches the agent and
seeds the prompt with a context line. Deletes workspace_export.py and the
server-side WorkspaceClient/get_status path; refreshes the MCP instructions
wording. Also drops two pre-existing unused-import lints in test_replay_only_flag.
…ersal hardening)

A basename of '..' would pass the regex (dots allowed) and make ./<name>
alias the project dir's parent. Reject '.'/'..'/'' -> 'workspace'.
…(CLI cold-start race)

Confirmed in the deployed app: 'databricks workspace export-dir' works (~2.15s)
but cold-starts SILENTLY before its output burst, so the prior 'wait until output
is quiet for 1.5s' heuristic declared done during that silence and the disk check
found no files -> 'No files were pulled'. Now the pull command appends
&& echo OK || echo FAIL (tokens split across string literals so the contiguous
form never appears in the shell's echo of the typed command), and _wait_for_pull
polls the PTY output for that marker, confirming files on disk for OK. Fast, exit-
status-aware failure detection; immune to cold-start silence.
…lder-trust gate)

coda_interactive launches claude in a fresh per-session dir, which tripped
Claude's per-directory folder-trust/permission dialog and swallowed the typed
prompt. Now claude (and any agent in the new _AGENT_AUTO_LAUNCH map) launches in
ONE atomic command — 'claude --enable-auto-mode <prompt>' — so no trust dialog
blocks the handoff and the prompt isn't subject to TUI cold-start timing. Context
prefix is single-line so it's safe as a quoted CLI arg. Other agents keep the
launch -> wait -> type fallback.
… coda_interactive

coda_interactive runs inside CoDA on Databricks and can only read Workspace
paths — not the calling (local) agent's filesystem. The MCP instructions and the
tool docstring now state this explicitly and give the concrete command
(databricks workspace import-dir <local> /Workspace/Users/<you>/<proj>) so a
local Claude Code / Codex knows to upload the project first, then call
coda_interactive with the resulting Workspace path. Tests pin both surfaces.
…le HTML)

Self-contained explainer for the CoDA MCP server: the four tools (coda_run,
coda_interactive, coda_inbox, coda_get_result), the three usage modes, two
end-to-end flow diagrams (interactive handoff + autonomous run), the workflow
protocol, the identity/file round-trip, result statuses, and architecture.
Databricks Light palette (Lava #FF3621, Navy #1B3139, Oat backgrounds) + DM Sans.
Verified rendering via headless browser.
…nCode

setup_proxy.py was moved into setup/ (fec2152, R100 rename) but kept
resolving content_filter_proxy.py relative to its own directory, so it
launched a nonexistent setup/content_filter_proxy.py. The proxy never
came up, and OpenCode — the only agent routed through 127.0.0.1:4000 —
failed with 'Cannot connect to API'. Other agents talk to the gateway
directly and were unaffected.

- setup_proxy.py: extract resolve_proxy_script_path() and guard the body
  under main() so the path logic is importable; resolve from the repo
  root (parent of setup/), where content_filter_proxy.py actually lives.
- tests/test_setup_proxy.py: pin the resolved path to an existing
  repo-root file so a future move can't silently regress it again.
- setup_opencode.py: register databricks-claude-opus-4-7 (the deployed
  default) in the OpenCode model map, drop a duplicate gemini-2-5-pro
  key, and align the script default with the other agents (opus-4-7).
…ts + codex catalog silently missing

Same fec2152 regression class as the content-filter proxy: setup_claude.py
and setup_codex.py moved into setup/ but kept resolving sibling resources
via Path(__file__).parent, which now points at setup/ instead of the repo root.

- setup_claude.py: agents/ is at the repo root, so the subagent copy hit the
  'No agents directory found' branch — the TDD subagents (build-feature,
  prd-writer, test-generator, implementer) were never installed into
  ~/.claude/agents.
- setup_codex.py: .codex/databricks-models.json is at the repo root, so the
  model catalog was never copied into ~/.codex while config.toml still
  referenced model_catalog_json.

Both now resolve from the repo root via small extracted resolvers
(resolve_agents_src / resolve_codex_catalog_src). tests/test_setup_resource_paths.py
AST-extracts and execs just those resolvers (avoiding the scripts' import-time
side effects) and pins each to an existing resource so a future move can't
silently regress it again.
…per-test hook isolation

The full unit suite failed intermittently (a different set each run) and
test_replay_only_flag failed deterministically in-suite. Two distinct causes:

1. terminate_session double-closed master_fd (production bug). Both the
   explicit close path (mcp_close_pty_session) and the read-thread exit path
   (read_pty_output) call it for the same session, but the kill/os.close block
   ran unconditionally — so the second os.close() could land on a since-reused
   fd (e.g. an asyncio loop's self-pipe allocated by a later test), surfacing
   as intermittent 'OSError: [Errno 9] Bad file descriptor'. Now claims the
   session atomically (sessions.pop) and closes exactly once. Covered by
   tests/test_terminate_session_idempotent.py.

2. app-hook leak across test files. mcp_server's _app_create/send/close hooks
   are process-wide globals; test_mcp_server._reset_hooks set them to None in
   teardown (and test_mcp_integration does set_app_hooks(None,...)), leaking
   None into later files. coda_run in test_replay_only_flag then created no PTY
   (pty_id is None) — but only in full-suite runs (in isolation, app's import
   re-wired the hooks). Added a conftest autouse fixture re-establishing app's
   real hooks after each test, making hook state independent of file order.

Also: TestNpmVersionLive now SKIPS when the npm registry is unreachable (the
skipif probe was raising TimeoutExpired as a collection error, and the body
asserted on a None result) instead of erroring/failing offline.

Full unit suite green across 3 consecutive runs: 613 passed, 2 skipped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant