feat(coda): three-mode framework — narrow coda_run to replay-only + add coda_interactive by datasciencemonkey · Pull Request #67 · databrickslabs/coding-agents-databricks-apps

datasciencemonkey · 2026-05-28T18:54:49Z

Summary

Two changes that together establish the three-mode framework for the CoDA MCP server:

Narrow coda_run to replay-only URLs — its returned viewer_url now always serves a static transcript from disk, never a live PTY attach. Drops the 5-minute grace machinery introduced in feat: CoDA MCP live session URL — watch hermes execute live + replay #66 (which was actually never wired in production — see commit 193c9a3 for the dead-code analysis).
Add new coda_interactive MCP tool — for human handoff from upstream MCP clients (Genie Code, Claude Desktop, Cursor). Caller passes a Databricks Workspace Git Folder path, optional branch, and a kickoff prompt. Coda exports the file tree, launches the chosen agent (claude default; also hermes/codex/gemini/opencode), auto-types the prompt, and returns a viewer_url for the human to attach to.

The three-mode framework now:

Mode	Tool	URL semantics	PTY lifecycle
1. Direct launch	Web UI tab	n/a (no external URL)	24h idle / WS-heartbeat extends
2. `coda_interactive` (new)	MCP tool	Live attach	24h idle / WS extends
3. `coda_run` (narrowed)	MCP tool	Replay only	Immediate teardown on hermes exit

Why Workspace Git Folders (not GitHub clone)

coda_interactive materializes project files via the Databricks Workspace API — uses Coda's existing DATABRICKS_TOKEN, no new credentials needed. Trade-off: git history is unavailable inside the session (files-only export). If history matters for a session, the MCP caller can include a git log summary in the prompt string.

Pre-existing security fix bundled

mcp_create_pty_session was stripping only 5 env vars; the HTTP create_session path was stripping NPM_TOKEN, UV_*, and npm_config_//* patterns too. Refactored to share _build_terminal_shell_env (commit ef15ef7). Closes a latent registry-credential leak into MCP-created PTYs.

Subsumes PR #66

PR #66 introduced live-attach + 5-min grace on coda_run. This PR keeps the viewer URL feature but narrows it to replay-only (since live attach moves to coda_interactive). Close #66 if this lands.

Spec + Plan artifacts (in the diff)

Todo 1 (Mode 3 narrowing): docs/superpowers/specs/2026-05-28-coda-run-replay-only-design.md + plans/2026-05-28-coda-run-replay-only.md
Todo 2 (Mode 2 addition): docs/superpowers/specs/2026-05-28-coda-interactive-mcp-tool-design.md + plans/2026-05-28-coda-interactive-mcp-tool.md
Both passed independent critic-agent reviews at spec, plan, and per-task stages.

Test plan

Local: 551 passed, 21 skipped — confirmed (pytest tests/ --ignore=tests/e2e)
PTY-gated tests skip cleanly on Mac dev environment; should pass on Linux CI/deployed app
Manual smoke against deployed CoDA: invoke coda_run from Genie Code; confirm viewer URL is replay-only
Manual smoke against deployed CoDA: invoke coda_interactive with a Workspace Git Folder; confirm agent launches in exported project dir with prompt typed
Confirm coda_inbox does NOT show interactive sessions; coda_get_result returns nothing for them
Verify _build_terminal_shell_env strip on deployed PTY — env | grep -E 'NPM_TOKEN|UV_' should be empty

Follow-up: broadened source contract (commits `326e19a..a555602`)

coda_interactive no longer requires the workspace_path to be a Databricks Workspace Git Folder. Any Workspace directory (Git Folder or plain Workspace folder) is accepted. The branch parameter has been removed — callers manage Git Folder branch state themselves before calling.

API change (no shipped consumers — safe):

coda_interactive(prompt, workspace_path, branch=..., agent=..., email=...) → coda_interactive(prompt, workspace_path, agent=..., email=...)
Return shape: "branch" key dropped.

Validation: replaced repos.list + exact-match filter + optional repos.update with a single workspace.get_status call + directory-type check (_is_directory from workspace_export.py). Clean errors for "path not found" and "path is not a directory" — both return before any PTY allocation.

Server-level instructions string rewritten to:

Tell callers that plain Workspace folders work.
Surface the upload-then-handoff pattern explicitly (workspace.import first if files aren't in the Workspace yet) so an upstream LLM knows the tool doesn't accept inline file payloads.

Quality fix bundled: hoisted the _app_send_input is None guard above PTY creation so an unwired send-hook can no longer orphan a PTY + project dir.

Test delta: −3 / +4. All 13 tests in tests/test_coda_interactive.py pass.

Artifacts:

Spec: docs/superpowers/specs/2026-05-28-coda-interactive-broaden-source-design.md
Plan: docs/superpowers/plans/2026-05-28-coda-interactive-broaden-source.md
Original spec marked as Amended by: for traceability.

Follow-up #2: Workflow protocol + Databricks orientation (commits `6ff6a9b..77321dc`)

coda_run now injects two new sections into prompt.txt:

CAPABILITIES — tells hermes about the Databricks CLI (pre-authed), the 16 Databricks skills under ~/.claude/skills/, and the DeepWiki / Exa / CoDA MCP servers.
WORKFLOW PROTOCOL — imposes a 3-phase pipeline (PLAN → EXECUTE → SYNTHESIZE) with a critique step after each phase (self-review or sub-agent — agent's choice). Max 2 iterations per phase to keep token cost bounded.

New terminal result.json status "info_needed" with a required feedback field gives the calling client a structured iteration loop when the agent is blocked. The existing "needs_approval" status is preserved with explicit disambiguation in the protocol: info_needed = "caller must add context"; needs_approval = "caller must approve a destructive action".

Three upstream-facing surfaces updated so calling LLMs know about the new statuses:

coda_inbox counts dict gains info_needed and needs_approval keys.
coda_get_result docstring lists all four valid statuses + the new feedback field.
FastMCP server-level instructions gain an INFO_NEEDED HANDOFF paragraph teaching upstream LLMs to read feedback and resubmit with previous_session_id.

Flag: coda_run(..., workflow_protocol=True) is the default. Set False to skip both new sections for non-Databricks tasks.

Artifacts:

Spec: docs/superpowers/specs/2026-05-28-coda-run-workflow-protocol-design.md
Plan: docs/superpowers/plans/2026-05-28-coda-run-workflow-protocol.md

Discipline gates run:

Spec critic → APPROVE-WITH-FIXES → 5 fixes applied (counts dict, needs_approval disambig, MCP instructions, canonical skill test, token estimate)
Plan critic → APPROVE-WITH-FIXES → 2 fixes applied (correct _write_json stub, expanded regression coverage)
Per task (1-4): spec + code-quality reviews, all approvals; minor fixes applied (section_noise dead code, JSON union-syntax placeholder)

Follow-up #3: `coda_interactive` pulls files in the terminal (fixes empty-session bug)

Bug: Calling coda_interactive launched the agent over an empty directory — the agent had no idea about the user's Workspace files.

Root cause: The MCP server's WorkspaceClient() resolves to the app's service principal (app-167dcd …), which can get_status the user's /Users/<user>/… folder (so the tool reported "launched") but cannot list/export its contents. workspace_export.py swallowed those errors → empty dir + misleading success. Verified: the CoDA terminal's CLI runs as the user (databricks current-user me → the user), and databricks workspace list/export of the folder works as the user via REST.

Fix: Stop exporting server-side. coda_interactive now types cd <project_dir> && databricks workspace export-dir <source> ./<name> && cd <name> into the PTY (authenticated as the user), waits for the pull to settle, then does a server-side filesystem post-check (identity-independent — it stats the local disk the terminal wrote). If files landed it launches the agent and seeds the prompt with a context line naming the source; if nothing landed it returns a real status=error and never launches.

Design highlights:

Split waits — wait for the pull to go idle (reliable stabilization), THEN the existing agent-ready wait after launch. Avoids pasting the prompt into a half-initialized agent.
databricks workspace export-dir natively handles notebook extensions, replacing the hand-rolled workspace_export.py (deleted, with its tests).
_wait_for_agent_ready is now a thin wrapper over a generalized _wait_for_output_stable(pty, max_wait, stability); coda_run is unchanged.
New helpers _safe_dirname (sanitized basename, rejects ./..) and _normalize_workspace_path (drops the /Workspace FUSE prefix).
MCP instructions wording updated; no more "server-side snapshot".

Tests: test_coda_interactive.py rewritten to the pull contract (pull-first, FS-check failure → no launch, prompt context line, agent matrix); _safe_dirname/_normalize_workspace_path + the wait-wrapper covered in test_mcp_server.py. Suite: 117 passed (only the documented PTY-fd flake fails in multi-file runs; passes in isolation).

Artifacts:

Spec: docs/superpowers/specs/2026-05-28-coda-interactive-terminal-pull-design.md
Plan: docs/superpowers/plans/2026-05-28-coda-interactive-terminal-pull.md

Discipline gates: brainstorm → design critic (SOUND-WITH-FIXES, folded in) → spec → plan → plan critic (SOUND-WITH-FIXES, folded in) → TDD implementation → final critic (SHIP-WITH-FOLLOWUPS; the one Important finding — .. path-traversal in _safe_dirname — fixed before push).

This branch also merges the latest main (Dependabot deps bump #68: urllib3 2.7.0 / gitpython 3.1.50 / idna 3.16).

Mounts an MCP server at `/mcp` so Databricks Genie Code (and other MCP clients like Claude Desktop, Cursor) can delegate coding tasks to the existing Hermes Agent infrastructure. Exposes three high-level tools following the v2 background-execution pattern: - coda_run — submit a coding task, returns task_id immediately - coda_inbox — poll all task statuses (24h window) - coda_get_result — fetch structured output of a completed task Plus internal helpers (`coda_create_session`, `coda_get_status`, `coda_close_session`). Sessions and task state are persisted to disk under `~/.coda/sessions/` so tasks survive worker restarts. Architecture ------------ - Native MCP SDK transport (`FastMCP.streamable_http_app()`) — required by Genie Code's Custom MCP server picker (custom JSON-RPC handlers don't work). - `stateless_http=True`, `json_response=True`. DNS-rebinding protection disabled (proxy handles auth, workspace origin allowed via CORS middleware). - Switches the production entrypoint from gunicorn → uvicorn so we can serve both the MCP ASGI app and the existing Flask UI side-by-side (Flask mounted via WSGIMiddleware). WebSocket falls back to HTTP polling under uvicorn — acceptable per the design doc; the Web Worker poller is already in place. - Skips CSP/security headers on the `/mcp` path (CSP interfered with Genie Code's transport). - Hermes is always the agent invoked; it routes to sub-agents internally. - Adds a stdio MCP bridge (`tools/coda-bridge.py`) for Claude Code's OAuth-based auth flow. Repository reshuffles --------------------- - New `coda_mcp/` package: `mcp_server`, `mcp_endpoint`, `mcp_asgi`, `task_manager`. - `setup_*.py` moved from repo root to `setup/`. - `install_*.sh` moved from repo root to `scripts/`. - Tests: new coverage for the MCP server, integration flow, task manager, content filter proxy, sync_to_workspace, _run_step. - Docs: `docs/mcp-client-setup.md`, `docs/mcp-v2-background-execution.md`, and the full implementation plan at `docs/plans/2026-05-01-coda-mcp-server.md`. Safety guardrails ----------------- The CODA-TASK prompt envelope explicitly forbids destructive operations (DROP/DELETE/TRUNCATE, CLI deletes, permission changes) at the prompt level, in line with the CoDA Constitution. Tested as `mcp-test-coda` on workspace `fevm-serverless-9cefok` (profile `9cefok`). App name must start with `mcp-` to appear in the Genie Code Custom MCP server picker. Provenance ---------- Squashed from 40 commits originally on `datasciencemonkey/coding-agents-databricks-apps#156`, last working tip `1ce86bf`. Full commit-by-commit history preserved locally on the tag `coda-mcp-backup-2026-05-25`. Conflict resolutions during the squash: - README.md MLflow section: kept main's Claude+Codex unified switch (newer than coda-mcp's Claude-only state). - setup/setup_claude.py: combined main's enterprise installer URL handling with coda-mcp's `SKIP_CLAUDE_INSTALL` test escape hatch.

Surfaces a doc audit pass against the squash: - README: replace the gunicorn+Flask architecture diagram with the actual uvicorn ASGI stack (socketio.ASGIApp → /mcp + WSGI(Flask)). Update the startup-flow narrative, the "Server" config section (was "Gunicorn"), the project-structure annotations for app.yaml and gunicorn.conf.py (legacy, retained for WSGI-only dev), and the Technologies list. - app.yaml: prepend a comment block explaining why the entrypoint is uvicorn (FastMCP.streamable_http_app is native ASGI; gunicorn WSGI cannot serve it). Notes the polling-fallback behaviour and the retained-but-unused gunicorn.conf.py. - docs/plans/2026-05-01-coda-mcp-server.md: prepend a SUPERSEDED banner. The shipped implementation is the v2 design in docs/mcp-v2-background-execution.md (3 tools on uvicorn+ASGI), not the 5-tool gunicorn+WSGI plan in this file. Kept for design- evolution archaeology. - coda_mcp/mcp_endpoint.py: docstring now clearly states this module is a Flask Blueprint fallback for WSGI runtimes (gunicorn local dev, Flask test client). Production routes through coda_mcp.mcp_asgi.

Closes two coverage gaps surfaced by a pre-merge test audit. Both files exercise surfaces that production traffic actually hits, and neither had a dedicated test file before this commit. tests/test_mcp_endpoint.py (9 tests, all pass) - Pin the Flask Blueprint's JSON-RPC contract: initialize, tools/list, ping, tools/call (unknown), unknown method, CORS preflight, jsonrpc id echo, non-JSON body resilience, tool schema presence. - Asserts the tool surface is exactly {coda_run, coda_inbox, coda_get_result}. Drift from the v2 contract fails loudly. tests/test_coda_bridge.py (3 pass + 1 documented skip) - Verify the bridge injects the Databricks Bearer token mounted via `databricks auth token` into Authorization on every forwarded request (regression guard — a silent drop would 401 every Genie Code call against a deployed app). - Verify it surfaces server response bodies and refuses to run without CODA_MCP_URL configured. - Skip and document the stdout-capture variant for a follow-up. Full suite: 490 passed, 2 skipped (was 478/1 before this PR). No regressions.

…nt /socket.io middleware gap, split _doReplay from _doAttach

…inate_session

…ests

…anscript wiring

…arded-Host Adds AppUrlCaptureMiddleware to mcp_asgi.py that captures X-Forwarded-Host (falling back to Host) from every inbound HTTP request and populates url_builder._app_url_cache. Also hardens capture_from_headers to strip accidental https:// / http:// scheme prefixes before caching, preventing double-scheme URLs in build_viewer_url output.

… gone

…ering Adds _initFromQueryString() boot-time URL parse, _doReplay() for static transcript rendering in 64KB RAF-yielded chunks, _renderExpiredPage() for 404 fallback, and history.replaceState hygiene on pane/tab close.

Appends test_end_to_end_grace_and_replay to test_mcp_integration.py. Exercises the full coda_run flow with real Flask PTY hooks: create PTY, send input, write result.json, trigger _schedule_deferred_close, verify grace state and deferred PTY teardown, confirm transcript persists, and validate find_task_dir_by_pty_session resolves correctly. Guarded by _pty_skip so headless CI without PTY allocators skips cleanly.

Initialize pty_id, sess_id, and task_id to None before the try/finally in test_end_to_end_grace_and_replay so that an early exception (e.g., coda_run or _read_session raising) doesn't trigger UnboundLocalError on "if pty_id in sessions", which would mask the original exception. The finally now guards with "if pty_id and pty_id in sessions".

…r_session for clarity

The watcher thread spawned by coda_run polls for result.json every 5s and, when it finds one, calls complete_task + _schedule_deferred_close itself. The E2E test does that orchestration manually so it can assert on intermediate state. With both drivers active, the watcher races the test body and produces SessionNotFoundError plus flaky assertion failures. Monkeypatch coda_mcp.mcp_server._watch_task to a no-op for this specific test so the manual orchestration is the sole driver.

Splits coda_run (Mode 3: replay-only) from forthcoming coda_interactive (Mode 2: live attach). Rips out the 5-min grace machinery — never wired in prod and obsolete under the three-mode framework. Critique gate cleared.

11 TDD tasks + pre-flight + final verification. Plan critique gate cleared — three issues caught and fixed (locking deadlock in Task 8, TDD violation in Task 5, missing PTY-skip guard in Task 10).

Backward-compatible default (False). Stored in session dict for later attach-time enforcement.

Pure refactor — no behavior change. Helper is also used by the new replay_only short-circuit in the next commit.

…session Replay-only sessions always serve the on-disk transcript regardless of whether the PTY is still alive. Used by coda_run (wired in the next commit).

Mode 3 in the three-mode framework. The viewer_url returned by coda_run now always resolves to a transcript-from-disk replay.

Bind pty_id from session.get() before entering try block so the finally cleanup runs even if the session row is malformed.

… close Pure call-site swap. Behavior change: PTY teardown is immediate rather than 5-minute-deferred. _schedule_deferred_close becomes dead code, ripped out in a follow-up commit.

…se enum branch of _is_directory

Replaces the Repos API lookup (repos.list + repos.update) with a single workspace.get_status check. Caller is now responsible for managing Git Folder branch state. Workspace path can be a Git Folder or a plain Workspace folder — either works.

Both _app_create_session and _app_send_input are set together by set_app_hooks(); validating them together before PTY creation closes a resource-leak path where the PTY was created but the send-hook check fired and returned, orphaning the PTY and project_dir. Also: parenthesize the path in 'Workspace path not found' error for readability.

Tells upstream LLM callers that workspace_path can be either a Git Folder or a plain Workspace folder, and surfaces the upload-then-handoff pattern (workspace.import before calling). Drops the 'commit and push' admonition that only applied to Git Folders. New pinned test guards the contract.

…r code review

…e spec

…eamble

…, counts dict, MCP instructions, canonical skill test, token estimate

…n), add test_mcp_server + test_replay_only_flag regression coverage

…OL builders Two pure-function builders for the new prompt envelope sections plus the canonical Databricks skill list. Tests pin the skill list against CLAUDE.md to catch drift in either direction, and pin both sections to token budgets.

…dy excludes uppercase

…ap_prompt The flag defaults to True. When set, wrap_prompt inserts CAPABILITIES and WORKFLOW PROTOCOL sections between TASK and INSTRUCTIONS in prompt.txt. Callers can opt out via workflow_protocol=False on coda_run for purely non-Databricks tasks.

…CTIONS The INSTRUCTIONS section of prompt.txt now enumerates the four allowed result.json status values (completed, failed, info_needed, needs_approval), describes when to use each, and lists the canonical status.jsonl step labels emitted by the workflow protocol.

…xample per code review

…t doc, MCP instructions Three surfaces updated so calling LLMs and dashboards know about the two soft terminal statuses: - coda_inbox counts dict gains info_needed and needs_approval keys. - coda_get_result docstring lists all four valid statuses and the feedback field that accompanies info_needed. - FastMCP server-level instructions gain an INFO_NEEDED HANDOFF paragraph teaching upstream LLMs to read 'feedback' and resubmit with previous_session_id for the chained context. Existing test_mcp_server.py inbox-counts assertions updated to match the new 5-key shape.

…tive-handoff

…s server-side export) Root cause: deployed app runs as its own service principal which can't list/export the user's Workspace folder; workspace_export.py swallows the error -> empty dir. Fix: pull files in the terminal (authed as the user) via 'databricks workspace export-dir', with split waits + a server-side filesystem post-check. Folds in all design-critic SOUND-WITH-FIXES items.

…arden prompt-seed test

…elper _wait_for_output_stable(pty, max_wait, stability) is the parametrized poller; _wait_for_agent_ready becomes a thin wrapper preserving the 5.0/1.0 budget so coda_run is unaffected. Adds _EXPORT_MAX_WAIT_S/_EXPORT_STABILITY_S for the terminal-side pull wait.

…ver-side Root cause of the empty-session bug: the MCP server's WorkspaceClient runs as the app service principal, which can't list/export the user's Workspace folder, and workspace_export.py swallowed the error -> empty dir + a misleading 'launched'. Now the tool types 'databricks workspace export-dir' into the PTY (authed as the user), waits for the pull to settle, verifies files landed on disk (server-side FS check, identity-independent), then launches the agent and seeds the prompt with a context line. Deletes workspace_export.py and the server-side WorkspaceClient/get_status path; refreshes the MCP instructions wording. Also drops two pre-existing unused-import lints in test_replay_only_flag.

…ersal hardening) A basename of '..' would pass the regex (dots allowed) and make ./<name> alias the project dir's parent. Reject '.'/'..'/'' -> 'workspace'.

…(CLI cold-start race) Confirmed in the deployed app: 'databricks workspace export-dir' works (~2.15s) but cold-starts SILENTLY before its output burst, so the prior 'wait until output is quiet for 1.5s' heuristic declared done during that silence and the disk check found no files -> 'No files were pulled'. Now the pull command appends && echo OK || echo FAIL (tokens split across string literals so the contiguous form never appears in the shell's echo of the typed command), and _wait_for_pull polls the PTY output for that marker, confirming files on disk for OK. Fast, exit- status-aware failure detection; immune to cold-start silence.

…lder-trust gate) coda_interactive launches claude in a fresh per-session dir, which tripped Claude's per-directory folder-trust/permission dialog and swallowed the typed prompt. Now claude (and any agent in the new _AGENT_AUTO_LAUNCH map) launches in ONE atomic command — 'claude --enable-auto-mode <prompt>' — so no trust dialog blocks the handoff and the prompt isn't subject to TUI cold-start timing. Context prefix is single-line so it's safe as a quoted CLI arg. Other agents keep the launch -> wait -> type fallback.

… coda_interactive coda_interactive runs inside CoDA on Databricks and can only read Workspace paths — not the calling (local) agent's filesystem. The MCP instructions and the tool docstring now state this explicitly and give the concrete command (databricks workspace import-dir <local> /Workspace/Users/<you>/<proj>) so a local Claude Code / Codex knows to upload the project first, then call coda_interactive with the resulting Workspace path. Tests pin both surfaces.

…le HTML) Self-contained explainer for the CoDA MCP server: the four tools (coda_run, coda_interactive, coda_inbox, coda_get_result), the three usage modes, two end-to-end flow diagrams (interactive handoff + autonomous run), the workflow protocol, the identity/file round-trip, result statuses, and architecture. Databricks Light palette (Lava #FF3621, Navy #1B3139, Oat backgrounds) + DM Sans. Verified rendering via headless browser.

…nCode setup_proxy.py was moved into setup/ (fec2152, R100 rename) but kept resolving content_filter_proxy.py relative to its own directory, so it launched a nonexistent setup/content_filter_proxy.py. The proxy never came up, and OpenCode — the only agent routed through 127.0.0.1:4000 — failed with 'Cannot connect to API'. Other agents talk to the gateway directly and were unaffected. - setup_proxy.py: extract resolve_proxy_script_path() and guard the body under main() so the path logic is importable; resolve from the repo root (parent of setup/), where content_filter_proxy.py actually lives. - tests/test_setup_proxy.py: pin the resolved path to an existing repo-root file so a future move can't silently regress it again. - setup_opencode.py: register databricks-claude-opus-4-7 (the deployed default) in the OpenCode model map, drop a duplicate gemini-2-5-pro key, and align the script default with the other agents (opus-4-7).

…ts + codex catalog silently missing Same fec2152 regression class as the content-filter proxy: setup_claude.py and setup_codex.py moved into setup/ but kept resolving sibling resources via Path(__file__).parent, which now points at setup/ instead of the repo root. - setup_claude.py: agents/ is at the repo root, so the subagent copy hit the 'No agents directory found' branch — the TDD subagents (build-feature, prd-writer, test-generator, implementer) were never installed into ~/.claude/agents. - setup_codex.py: .codex/databricks-models.json is at the repo root, so the model catalog was never copied into ~/.codex while config.toml still referenced model_catalog_json. Both now resolve from the repo root via small extracted resolvers (resolve_agents_src / resolve_codex_catalog_src). tests/test_setup_resource_paths.py AST-extracts and execs just those resolvers (avoiding the scripts' import-time side effects) and pins each to an existing resource so a future move can't silently regress it again.

…per-test hook isolation The full unit suite failed intermittently (a different set each run) and test_replay_only_flag failed deterministically in-suite. Two distinct causes: 1. terminate_session double-closed master_fd (production bug). Both the explicit close path (mcp_close_pty_session) and the read-thread exit path (read_pty_output) call it for the same session, but the kill/os.close block ran unconditionally — so the second os.close() could land on a since-reused fd (e.g. an asyncio loop's self-pipe allocated by a later test), surfacing as intermittent 'OSError: [Errno 9] Bad file descriptor'. Now claims the session atomically (sessions.pop) and closes exactly once. Covered by tests/test_terminate_session_idempotent.py. 2. app-hook leak across test files. mcp_server's _app_create/send/close hooks are process-wide globals; test_mcp_server._reset_hooks set them to None in teardown (and test_mcp_integration does set_app_hooks(None,...)), leaking None into later files. coda_run in test_replay_only_flag then created no PTY (pty_id is None) — but only in full-suite runs (in isolation, app's import re-wired the hooks). Added a conftest autouse fixture re-establishing app's real hooks after each test, making hook state independent of file order. Also: TestNpmVersionLive now SKIPS when the npm registry is unreachable (the skipif probe was raising TimeoutExpired as a collection error, and the body asserted on a None result) instead of erroring/failing offline. Full unit suite green across 3 consecutive runs: 613 passed, 2 skipped.

datasciencemonkey added 30 commits May 25, 2026 19:01

docs: design spec for CoDA MCP live session URL + replay

b2b06e3

docs(spec): incorporate architect review — lock transcript_fh, docume…

02431c8

…nt /socket.io middleware gap, split _doReplay from _doAttach

docs(plan): implementation plan for CoDA MCP live session URL

3a882eb

feat(coda-mcp): url_builder module for viewer_url resolution

ac3a8c7

feat(coda-mcp): find_task_dir_by_pty_session lookup with TTL cache

5becd59

feat: tee PTY output to transcript.log with lock-guarded writes

fc6a4a6

feat: open transcript handle in mcp_create_pty_session; close in term…

a3b5f9a

…inate_session

feat: harden transcript open against fd leak; add PTY skip guard on t…

c5b0c70

…ests

feat: exempt grace-period PTYs from MAX_CONCURRENT_SESSIONS

67a6e02

feat(coda-mcp): defer PTY close by GRACE_PERIOD_S via threading.Timer

85a6901

feat(coda-mcp): return viewer_url from coda_run/inbox/get_result + tr…

d800b3e

…anscript wiring

feat: attach_session replay fallback reads transcript.log when PTY is…

856c54c

… gone

fix(spa): deep-link panes own their own input wiring and join_session

5532719

fix(spa): use textContent in expired page (XSS); rename _close_pty_fo…

db4948f

…r_session for clarity

docs: spec for coda_run replay-only URL + scratchpad todos

572d64c

Splits coda_run (Mode 3: replay-only) from forthcoming coda_interactive (Mode 2: live attach). Rips out the 5-min grace machinery — never wired in prod and obsolete under the three-mode framework. Critique gate cleared.

docs: implementation plan for coda_run replay-only URL

738e8ef

11 TDD tasks + pre-flight + final verification. Plan critique gate cleared — three issues caught and fixed (locking deadlock in Task 8, TDD violation in Task 5, missing PTY-skip guard in Task 10).

feat: add replay_only param to mcp_create_pty_session

c75e9df

Backward-compatible default (False). Stored in session dict for later attach-time enforcement.

refactor: extract _serve_transcript_replay helper from attach_session

cd47dd6

Pure refactor — no behavior change. Helper is also used by the new replay_only short-circuit in the next commit.

feat: replay_only PTY sessions short-circuit to transcript in attach_…

cb97347

…session Replay-only sessions always serve the on-disk transcript regardless of whether the PTY is still alive. Used by coda_run (wired in the next commit).

feat: coda_run creates PTY sessions with replay_only=True

5cabbcf

Mode 3 in the three-mode framework. The viewer_url returned by coda_run now always resolves to a transcript-from-disk replay.

test: harden Task 4 test against PTY leak on _read_session failure

2eb1238

Bind pty_id from session.get() before entering try block so the finally cleanup runs even if the session row is malformed.

refactor: _watch_task uses _close_pty_immediately instead of deferred…

062122f

… close Pure call-site swap. Behavior change: PTY teardown is immediate rather than 5-minute-deferred. _schedule_deferred_close becomes dead code, ripped out in a follow-up commit.

datasciencemonkey added 30 commits May 28, 2026 16:14

test: address code-quality review — drop unused pytest import, exerci…

4cf06fe

…se enum branch of _is_directory

test: tighten instructions assertion — drop loose 'post' substring pe…

3b638cd

…r code review

docs: mark original coda_interactive spec as amended by broaden-sourc…

a555602

…e spec

docs: spec for coda_run workflow protocol + Databricks orientation pr…

6ff6a9b

…eamble

docs: address workflow-protocol spec critic — needs_approval disambig…

6b47abe

…, counts dict, MCP instructions, canonical skill test, token estimate

docs: plan for coda_run workflow protocol implementation

613604e

docs: address plan critic — fix _write_task_meta stub (use _write_jso…

6c348da

…n), add test_mcp_server + test_replay_only_flag regression coverage

test: drop dead section_noise allowlist per code review — regex alrea…

8eedc1c

…dy excludes uppercase

fix: clarify status placeholder is non-literal in INSTRUCTIONS JSON e…

4b92bbd

…xample per code review

Merge remote-tracking branch 'origin/main' into feat/coda-mcp-interac…

197cb21

…tive-handoff

docs: implementation plan for coda_interactive terminal-side pull

ab2181b

docs: address plan critic — update stale mcp.instructions wording + h…

9de7897

…arden prompt-seed test

fix: _safe_dirname rejects '.'/'..' basenames (final-critic path-trav…

0911d8b

…ersal hardening) A basename of '..' would pass the regex (dots allowed) and make ./<name> alias the project dir's parent. Reject '.'/'..'/'' -> 'workspace'.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(coda): three-mode framework — narrow coda_run to replay-only + add coda_interactive#67

feat(coda): three-mode framework — narrow coda_run to replay-only + add coda_interactive#67
datasciencemonkey wants to merge 93 commits into
mainfrom
feat/coda-mcp-interactive-handoff

datasciencemonkey commented May 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

datasciencemonkey commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The three-mode framework now:

Why Workspace Git Folders (not GitHub clone)

Pre-existing security fix bundled

Subsumes PR #66

Spec + Plan artifacts (in the diff)

Test plan

Follow-up: broadened source contract (commits 326e19a..a555602)

Follow-up #2: Workflow protocol + Databricks orientation (commits 6ff6a9b..77321dc)

Follow-up #3: coda_interactive pulls files in the terminal (fixes empty-session bug)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

datasciencemonkey commented May 28, 2026 •

edited

Loading

Follow-up: broadened source contract (commits `326e19a..a555602`)

Follow-up #2: Workflow protocol + Databricks orientation (commits `6ff6a9b..77321dc`)

Follow-up #3: `coda_interactive` pulls files in the terminal (fixes empty-session bug)