Skip to content

feat: CoDA MCP live session URL — watch hermes execute live + replay#66

Open
datasciencemonkey wants to merge 22 commits into
mainfrom
feat/coda-mcp-live-session-url
Open

feat: CoDA MCP live session URL — watch hermes execute live + replay#66
datasciencemonkey wants to merge 22 commits into
mainfrom
feat/coda-mcp-live-session-url

Conversation

@datasciencemonkey
Copy link
Copy Markdown
Collaborator

Summary

Adds a viewer_url field to every CoDA MCP tool response (coda_run, coda_inbox, coda_get_result) so the calling user can open a browser to watch hermes execute live in the existing terminal UI. A 5-minute grace period keeps the PTY alive after task completion for tail viewing; after that, a static replay reads bytes from an on-disk transcript.log (10 MB soft cap, mode 0600).

Relationship to #64

This PR subsumes #64 if accepted. It contains everything in feat: CoDA MCP server for Genie Code integration (the 3 commits from #64) plus 19 additional commits implementing the live-trace feature on top. Reviewers can choose:

Architecture highlights

  • Tee PTY bytes to disk in app.py::read_pty_output under session["lock"]. File handle owned by the PTY reader, closed under swap-to-None pattern in terminate_session.
  • threading.Timer(GRACE_PERIOD_S=300, _app_close_session) replaces immediate PTY close in _watch_task on both completion and timeout paths. Daemon thread so uvicorn shutdown isn't blocked.
  • Grace-period PTYs exempted from MAX_CONCURRENT_SESSIONS via a new sess["grace"] flag and a non-grace-counting active = sum(...) check.
  • Base URL detection via CODA_APP_URL env override → AppUrlCaptureMiddleware capturing X-Forwarded-Host (officially provided by Databricks Apps).
  • /api/session/attach replay fallback: when PTY is gone, look up transcript.log via new task_manager.find_task_dir_by_pty_session (60s TTL cache) and serve bytes with replay: true.
  • SPA deep-link at /?session=<pty_id> — boot-time URL parse routes to existing _doAttach (live) or new _doReplay (chunked rAF-yielding term.write for replay). Replay-mode panes are genuinely read-only (no term.onData wiring, no join_session emit).
  • MCP instructions updated to nudge the calling model to share the URL with the user.

Design + plan docs

  • Spec: docs/superpowers/specs/2026-05-27-coda-mcp-live-session-url-design.md
  • Plan: docs/superpowers/plans/2026-05-27-coda-mcp-live-session-url.md

Both architect-reviewed and approved. Three review iterations folded in (lock discipline, replay rendering separation from _doAttach, XSS hardening in expired page).

Test plan

  • 527 passed, 11 skipped across full non-e2e suite (PTY-gated tests skip without TTY allocator).
  • E2E test_end_to_end_grace_and_replay exercises real PTY + real file I/O + grace timer + transcript replay.
  • Final whole-branch code review passed YELLOW → fixes folded → GREEN.
  • Manual smoke in Databricks Genie Code: deploy → submit coda_run → click viewer_url → confirm live stream + grace + replay → confirm chmod 0600 on ~/.coda/sessions/*/tasks/*/transcript.log.

Out of scope (deferred)

  • Configurable agent selection (hermes vs claude-code vs codex) — separate spec.
  • Asciinema-style timed replay — bytes-only is sufficient for this round.

Mounts an MCP server at `/mcp` so Databricks Genie Code (and other MCP
clients like Claude Desktop, Cursor) can delegate coding tasks to the
existing Hermes Agent infrastructure. Exposes three high-level tools
following the v2 background-execution pattern:

  - coda_run        — submit a coding task, returns task_id immediately
  - coda_inbox      — poll all task statuses (24h window)
  - coda_get_result — fetch structured output of a completed task

Plus internal helpers (`coda_create_session`, `coda_get_status`,
`coda_close_session`). Sessions and task state are persisted to disk
under `~/.coda/sessions/` so tasks survive worker restarts.

Architecture
------------
- Native MCP SDK transport (`FastMCP.streamable_http_app()`) — required
  by Genie Code's Custom MCP server picker (custom JSON-RPC handlers
  don't work).
- `stateless_http=True`, `json_response=True`. DNS-rebinding protection
  disabled (proxy handles auth, workspace origin allowed via CORS
  middleware).
- Switches the production entrypoint from gunicorn → uvicorn so we can
  serve both the MCP ASGI app and the existing Flask UI side-by-side
  (Flask mounted via WSGIMiddleware). WebSocket falls back to HTTP
  polling under uvicorn — acceptable per the design doc; the Web Worker
  poller is already in place.
- Skips CSP/security headers on the `/mcp` path (CSP interfered with
  Genie Code's transport).
- Hermes is always the agent invoked; it routes to sub-agents
  internally.
- Adds a stdio MCP bridge (`tools/coda-bridge.py`) for Claude Code's
  OAuth-based auth flow.

Repository reshuffles
---------------------
- New `coda_mcp/` package: `mcp_server`, `mcp_endpoint`, `mcp_asgi`,
  `task_manager`.
- `setup_*.py` moved from repo root to `setup/`.
- `install_*.sh` moved from repo root to `scripts/`.
- Tests: new coverage for the MCP server, integration flow, task
  manager, content filter proxy, sync_to_workspace, _run_step.
- Docs: `docs/mcp-client-setup.md`, `docs/mcp-v2-background-execution.md`,
  and the full implementation plan at
  `docs/plans/2026-05-01-coda-mcp-server.md`.

Safety guardrails
-----------------
The CODA-TASK prompt envelope explicitly forbids destructive operations
(DROP/DELETE/TRUNCATE, CLI deletes, permission changes) at the prompt
level, in line with the CoDA Constitution.

Tested as `mcp-test-coda` on workspace `fevm-serverless-9cefok`
(profile `9cefok`). App name must start with `mcp-` to appear in the
Genie Code Custom MCP server picker.

Provenance
----------
Squashed from 40 commits originally on
`datasciencemonkey/coding-agents-databricks-apps#156`, last working
tip `1ce86bf`. Full commit-by-commit history preserved locally on the
tag `coda-mcp-backup-2026-05-25`.

Conflict resolutions during the squash:
  - README.md MLflow section: kept main's Claude+Codex unified switch
    (newer than coda-mcp's Claude-only state).
  - setup/setup_claude.py: combined main's enterprise installer URL
    handling with coda-mcp's `SKIP_CLAUDE_INSTALL` test escape hatch.
Surfaces a doc audit pass against the squash:

- README: replace the gunicorn+Flask architecture diagram with the
  actual uvicorn ASGI stack (socketio.ASGIApp → /mcp + WSGI(Flask)).
  Update the startup-flow narrative, the "Server" config section
  (was "Gunicorn"), the project-structure annotations for app.yaml
  and gunicorn.conf.py (legacy, retained for WSGI-only dev), and the
  Technologies list.
- app.yaml: prepend a comment block explaining why the entrypoint is
  uvicorn (FastMCP.streamable_http_app is native ASGI; gunicorn WSGI
  cannot serve it). Notes the polling-fallback behaviour and the
  retained-but-unused gunicorn.conf.py.
- docs/plans/2026-05-01-coda-mcp-server.md: prepend a SUPERSEDED
  banner. The shipped implementation is the v2 design in
  docs/mcp-v2-background-execution.md (3 tools on uvicorn+ASGI), not
  the 5-tool gunicorn+WSGI plan in this file. Kept for design-
  evolution archaeology.
- coda_mcp/mcp_endpoint.py: docstring now clearly states this module
  is a Flask Blueprint fallback for WSGI runtimes (gunicorn local dev,
  Flask test client). Production routes through coda_mcp.mcp_asgi.
Closes two coverage gaps surfaced by a pre-merge test audit. Both
files exercise surfaces that production traffic actually hits, and
neither had a dedicated test file before this commit.

tests/test_mcp_endpoint.py (9 tests, all pass)
- Pin the Flask Blueprint's JSON-RPC contract: initialize, tools/list,
  ping, tools/call (unknown), unknown method, CORS preflight,
  jsonrpc id echo, non-JSON body resilience, tool schema presence.
- Asserts the tool surface is exactly {coda_run, coda_inbox,
  coda_get_result}. Drift from the v2 contract fails loudly.

tests/test_coda_bridge.py (3 pass + 1 documented skip)
- Verify the bridge injects the Databricks Bearer token mounted via
  `databricks auth token` into Authorization on every forwarded
  request (regression guard — a silent drop would 401 every Genie
  Code call against a deployed app).
- Verify it surfaces server response bodies and refuses to run
  without CODA_MCP_URL configured.
- Skip and document the stdout-capture variant for a follow-up.

Full suite: 490 passed, 2 skipped (was 478/1 before this PR). No
regressions.
…nt /socket.io middleware gap, split _doReplay from _doAttach
…arded-Host

Adds AppUrlCaptureMiddleware to mcp_asgi.py that captures X-Forwarded-Host
(falling back to Host) from every inbound HTTP request and populates
url_builder._app_url_cache. Also hardens capture_from_headers to strip
accidental https:// / http:// scheme prefixes before caching, preventing
double-scheme URLs in build_viewer_url output.
…ering

Adds _initFromQueryString() boot-time URL parse, _doReplay() for static
transcript rendering in 64KB RAF-yielded chunks, _renderExpiredPage() for
404 fallback, and history.replaceState hygiene on pane/tab close.
Appends test_end_to_end_grace_and_replay to test_mcp_integration.py.
Exercises the full coda_run flow with real Flask PTY hooks: create PTY,
send input, write result.json, trigger _schedule_deferred_close, verify
grace state and deferred PTY teardown, confirm transcript persists, and
validate find_task_dir_by_pty_session resolves correctly. Guarded by
_pty_skip so headless CI without PTY allocators skips cleanly.
Initialize pty_id, sess_id, and task_id to None before the try/finally in
test_end_to_end_grace_and_replay so that an early exception (e.g.,
coda_run or _read_session raising) doesn't trigger UnboundLocalError on
"if pty_id in sessions", which would mask the original exception. The
finally now guards with "if pty_id and pty_id in sessions".
The watcher thread spawned by coda_run polls for result.json every 5s
and, when it finds one, calls complete_task + _schedule_deferred_close
itself. The E2E test does that orchestration manually so it can assert
on intermediate state. With both drivers active, the watcher races
the test body and produces SessionNotFoundError plus flaky assertion
failures.

Monkeypatch coda_mcp.mcp_server._watch_task to a no-op for this
specific test so the manual orchestration is the sole driver.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant