feat: CoDA MCP live session URL — watch hermes execute live + replay#66
Open
datasciencemonkey wants to merge 22 commits into
Open
feat: CoDA MCP live session URL — watch hermes execute live + replay#66datasciencemonkey wants to merge 22 commits into
datasciencemonkey wants to merge 22 commits into
Conversation
Mounts an MCP server at `/mcp` so Databricks Genie Code (and other MCP clients like Claude Desktop, Cursor) can delegate coding tasks to the existing Hermes Agent infrastructure. Exposes three high-level tools following the v2 background-execution pattern: - coda_run — submit a coding task, returns task_id immediately - coda_inbox — poll all task statuses (24h window) - coda_get_result — fetch structured output of a completed task Plus internal helpers (`coda_create_session`, `coda_get_status`, `coda_close_session`). Sessions and task state are persisted to disk under `~/.coda/sessions/` so tasks survive worker restarts. Architecture ------------ - Native MCP SDK transport (`FastMCP.streamable_http_app()`) — required by Genie Code's Custom MCP server picker (custom JSON-RPC handlers don't work). - `stateless_http=True`, `json_response=True`. DNS-rebinding protection disabled (proxy handles auth, workspace origin allowed via CORS middleware). - Switches the production entrypoint from gunicorn → uvicorn so we can serve both the MCP ASGI app and the existing Flask UI side-by-side (Flask mounted via WSGIMiddleware). WebSocket falls back to HTTP polling under uvicorn — acceptable per the design doc; the Web Worker poller is already in place. - Skips CSP/security headers on the `/mcp` path (CSP interfered with Genie Code's transport). - Hermes is always the agent invoked; it routes to sub-agents internally. - Adds a stdio MCP bridge (`tools/coda-bridge.py`) for Claude Code's OAuth-based auth flow. Repository reshuffles --------------------- - New `coda_mcp/` package: `mcp_server`, `mcp_endpoint`, `mcp_asgi`, `task_manager`. - `setup_*.py` moved from repo root to `setup/`. - `install_*.sh` moved from repo root to `scripts/`. - Tests: new coverage for the MCP server, integration flow, task manager, content filter proxy, sync_to_workspace, _run_step. - Docs: `docs/mcp-client-setup.md`, `docs/mcp-v2-background-execution.md`, and the full implementation plan at `docs/plans/2026-05-01-coda-mcp-server.md`. Safety guardrails ----------------- The CODA-TASK prompt envelope explicitly forbids destructive operations (DROP/DELETE/TRUNCATE, CLI deletes, permission changes) at the prompt level, in line with the CoDA Constitution. Tested as `mcp-test-coda` on workspace `fevm-serverless-9cefok` (profile `9cefok`). App name must start with `mcp-` to appear in the Genie Code Custom MCP server picker. Provenance ---------- Squashed from 40 commits originally on `datasciencemonkey/coding-agents-databricks-apps#156`, last working tip `1ce86bf`. Full commit-by-commit history preserved locally on the tag `coda-mcp-backup-2026-05-25`. Conflict resolutions during the squash: - README.md MLflow section: kept main's Claude+Codex unified switch (newer than coda-mcp's Claude-only state). - setup/setup_claude.py: combined main's enterprise installer URL handling with coda-mcp's `SKIP_CLAUDE_INSTALL` test escape hatch.
Surfaces a doc audit pass against the squash: - README: replace the gunicorn+Flask architecture diagram with the actual uvicorn ASGI stack (socketio.ASGIApp → /mcp + WSGI(Flask)). Update the startup-flow narrative, the "Server" config section (was "Gunicorn"), the project-structure annotations for app.yaml and gunicorn.conf.py (legacy, retained for WSGI-only dev), and the Technologies list. - app.yaml: prepend a comment block explaining why the entrypoint is uvicorn (FastMCP.streamable_http_app is native ASGI; gunicorn WSGI cannot serve it). Notes the polling-fallback behaviour and the retained-but-unused gunicorn.conf.py. - docs/plans/2026-05-01-coda-mcp-server.md: prepend a SUPERSEDED banner. The shipped implementation is the v2 design in docs/mcp-v2-background-execution.md (3 tools on uvicorn+ASGI), not the 5-tool gunicorn+WSGI plan in this file. Kept for design- evolution archaeology. - coda_mcp/mcp_endpoint.py: docstring now clearly states this module is a Flask Blueprint fallback for WSGI runtimes (gunicorn local dev, Flask test client). Production routes through coda_mcp.mcp_asgi.
Closes two coverage gaps surfaced by a pre-merge test audit. Both
files exercise surfaces that production traffic actually hits, and
neither had a dedicated test file before this commit.
tests/test_mcp_endpoint.py (9 tests, all pass)
- Pin the Flask Blueprint's JSON-RPC contract: initialize, tools/list,
ping, tools/call (unknown), unknown method, CORS preflight,
jsonrpc id echo, non-JSON body resilience, tool schema presence.
- Asserts the tool surface is exactly {coda_run, coda_inbox,
coda_get_result}. Drift from the v2 contract fails loudly.
tests/test_coda_bridge.py (3 pass + 1 documented skip)
- Verify the bridge injects the Databricks Bearer token mounted via
`databricks auth token` into Authorization on every forwarded
request (regression guard — a silent drop would 401 every Genie
Code call against a deployed app).
- Verify it surfaces server response bodies and refuses to run
without CODA_MCP_URL configured.
- Skip and document the stdout-capture variant for a follow-up.
Full suite: 490 passed, 2 skipped (was 478/1 before this PR). No
regressions.
…nt /socket.io middleware gap, split _doReplay from _doAttach
…arded-Host Adds AppUrlCaptureMiddleware to mcp_asgi.py that captures X-Forwarded-Host (falling back to Host) from every inbound HTTP request and populates url_builder._app_url_cache. Also hardens capture_from_headers to strip accidental https:// / http:// scheme prefixes before caching, preventing double-scheme URLs in build_viewer_url output.
…ering Adds _initFromQueryString() boot-time URL parse, _doReplay() for static transcript rendering in 64KB RAF-yielded chunks, _renderExpiredPage() for 404 fallback, and history.replaceState hygiene on pane/tab close.
Appends test_end_to_end_grace_and_replay to test_mcp_integration.py. Exercises the full coda_run flow with real Flask PTY hooks: create PTY, send input, write result.json, trigger _schedule_deferred_close, verify grace state and deferred PTY teardown, confirm transcript persists, and validate find_task_dir_by_pty_session resolves correctly. Guarded by _pty_skip so headless CI without PTY allocators skips cleanly.
Initialize pty_id, sess_id, and task_id to None before the try/finally in test_end_to_end_grace_and_replay so that an early exception (e.g., coda_run or _read_session raising) doesn't trigger UnboundLocalError on "if pty_id in sessions", which would mask the original exception. The finally now guards with "if pty_id and pty_id in sessions".
…r_session for clarity
The watcher thread spawned by coda_run polls for result.json every 5s and, when it finds one, calls complete_task + _schedule_deferred_close itself. The E2E test does that orchestration manually so it can assert on intermediate state. With both drivers active, the watcher races the test body and produces SessionNotFoundError plus flaky assertion failures. Monkeypatch coda_mcp.mcp_server._watch_task to a no-op for this specific test so the manual orchestration is the sole driver.
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
viewer_urlfield to every CoDA MCP tool response (coda_run,coda_inbox,coda_get_result) so the calling user can open a browser to watch hermes execute live in the existing terminal UI. A 5-minute grace period keeps the PTY alive after task completion for tail viewing; after that, a static replay reads bytes from an on-disktranscript.log(10 MB soft cap, mode 0600).Relationship to #64
This PR subsumes #64 if accepted. It contains everything in
feat: CoDA MCP server for Genie Code integration(the 3 commits from #64) plus 19 additional commits implementing the live-trace feature on top. Reviewers can choose:Architecture highlights
app.py::read_pty_outputundersession["lock"]. File handle owned by the PTY reader, closed under swap-to-None pattern interminate_session.threading.Timer(GRACE_PERIOD_S=300, _app_close_session)replaces immediate PTY close in_watch_taskon both completion and timeout paths. Daemon thread so uvicorn shutdown isn't blocked.MAX_CONCURRENT_SESSIONSvia a newsess["grace"]flag and a non-grace-countingactive = sum(...)check.CODA_APP_URLenv override →AppUrlCaptureMiddlewarecapturingX-Forwarded-Host(officially provided by Databricks Apps)./api/session/attachreplay fallback: when PTY is gone, look uptranscript.logvia newtask_manager.find_task_dir_by_pty_session(60s TTL cache) and serve bytes withreplay: true./?session=<pty_id>— boot-time URL parse routes to existing_doAttach(live) or new_doReplay(chunked rAF-yieldingterm.writefor replay). Replay-mode panes are genuinely read-only (noterm.onDatawiring, nojoin_sessionemit).instructionsupdated to nudge the calling model to share the URL with the user.Design + plan docs
docs/superpowers/specs/2026-05-27-coda-mcp-live-session-url-design.mddocs/superpowers/plans/2026-05-27-coda-mcp-live-session-url.mdBoth architect-reviewed and approved. Three review iterations folded in (lock discipline, replay rendering separation from
_doAttach, XSS hardening in expired page).Test plan
527 passed, 11 skippedacross full non-e2e suite (PTY-gated tests skip without TTY allocator).test_end_to_end_grace_and_replayexercises real PTY + real file I/O + grace timer + transcript replay.coda_run→ clickviewer_url→ confirm live stream + grace + replay → confirmchmod 0600on~/.coda/sessions/*/tasks/*/transcript.log.Out of scope (deferred)