Add model backend selector and Claude Agent SDK runner#51
Conversation
Backends: uk_compiled (Rust) and uk_python (Python), selected per request or via POLICYENGINE_CHAT_BACKEND. Each controls its own prompt context, tool description, and execution globals. ChatPage exposes a toggle persisted in localStorage. /chat/backends lists what's available. Rebased onto main: keeps main's structural plan-mode enforcement and cached-reference cache breakpoints; backend prompt context is injected into the cached first block via a template placeholder. Claude Agent SDK runner is opt-in (POLICYENGINE_CHAT_AGENT_RUNNER=claude_sdk), mirrors the existing SSE event contract, exposes run_python via an in-process MCP tool. Also: modal_app installs from backend/requirements.txt, pins policyengine_uk==2.88.0, adds GA tracking. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Beta preview is ready.
|
The hostname-based slug parsing breaks when Vercel truncates long preview hostnames (e.g. feat/model-backend-selector becomes policyengine-uk-chat-git-feat-model-backen-4022d4-policy-engine.vercel.app, yielding a Modal URL the deploy never created). Source the slug from the git ref instead, matching the CI workflow's derivation. Expose the env vars via next.config so they reach the client at build time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Vercel assigns multiple hostnames to the same preview deployment (the full one and a truncated-with-hash variant). Listing both in HOSTNAMES is brittle. Add a HOSTNAME_REGEX env var that overrides HOSTNAMES when set, and use a Vercel-preview-shaped regex in pr-beta-deploy.yml. Production deploys continue to use the explicit HOSTNAMES list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The uk_python backend reads HF datasets via policyengine-core's hugging_face downloader, which expects HUGGING_FACE_TOKEN in the env. Without it, microsim runs fail with 401 on any private dataset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an optional scenario_context field to /chat/message and a matching URL query-param read in ChatPage. Embedders (e.g. app-v2 opening the chat in an iframe drawer next to a report) can seed the session with the scenario the user is already viewing. The context lives in its own system block *after* the cache breakpoints, so it never invalidates the cached system prompt or reference doc — same pattern used for plan mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
vahid-ahmadi
left a comment
There was a problem hiding this comment.
Thanks for this — the model-backend registry is clean and well-structured. A few things to resolve before merge.
🔴 Blockers
1. Six-file merge conflict, and the chat_message one is dangerous.
This PR's merge-base is 4519811; main is now 9bcfbff with #47, #49 and #48 all merged. Conflicts in main.py, requirements.txt, chatbot.py, test_api.py, ChatPage.tsx, modal_app.py.
The risky one is chat_message. #48 (rate limiting) added @limiter.limit decorators and renamed the endpoint params to chat_message(request: Request, chat_request: ChatRequest). That rename is load-bearing — slowapi requires the parameter named request to be the Starlette Request, or the endpoint 500s on every call. This PR rewrites the same function but keeps the old chat_message(request: ChatRequest, http_request: Request) and adds new request.model_backend / request.scenario_context / http_request.is_disconnected() references.
When rebasing, please keep #48's decorators and signature, and rewrite all of this PR's new request.* body references to chat_request.* (and http_request. to request.). A naive "take ours" here silently reintroduces a total /chat/message outage.
2. Tests not run — only py_compile is claimed. For a +1,000-line PR reshaping _build_system_blocks, _select_chat_model, _tool_defs_for_anthropic, execute_tool and adding /chat/backends, please run pytest backend/tests/ and confirm green — especially the new TestModelBackends / TestChatBackends / TestScenarioContext.
🟠 Significant
3. Unannounced Google Analytics — please split it out. layout.tsx adds a GA tag (G-2YHG89FY0N) not mentioned anywhere in the PR description. For a UK-facing product, loading GA with no cookie-consent gate is a PECR/GDPR concern. This should be its own PR with a deliberate consent decision, not bundled into a backend-selector change.
4. The SDK runner can't run on the Modal image. claude-agent-sdk drives the Claude Code CLI as a subprocess and needs Node.js + the CLI; the Modal image (debian_slim + libpq-dev/gcc) has neither. It's gated off by default and the error is caught, so nothing breaks — but please state plainly that the runner is non-functional as-deployed until the image gains Node, so nobody flips the env var expecting it to work.
5. SDK runner is not at parity — no REFERENCE_DOC, no prompt caching. Enabling it silently degrades answer quality (the model loses the API reference). Fine as a gated experiment; just call it out.
🟡 Minor
scenario_contextcomes from a?scenario_context=URL param and is injected into the system prompt — a crafted link can prompt-inject a victim's session. Limited blast radius (after the cache breakpoints, labelled FROM EMBEDDER), but please add a length cap so it can't bloat the prompt/cost.execute_tooldispatchescompute/generate_chart, butget_tool_definitionsonly advertisesrun_python— those two are unreachable by the model.- SDK runner records token-based
cost_gbpwhile also reportingtotal_cost_usd— two cost numbers that can diverge. - SDK
tool_resultordering (tool_result_queuevstool_order) can mis-attribute a result if tools complete out of order.
✅ Good
- Clean
ModelBackenddataclass registry — per-backend prompt context, tool description, execution globals, import allowlist. - Prompt-cache discipline preserved —
SYSTEM_PROMPT_TEMPLATEhas only the single{backend_prompt_context}placeholder so.format()is safe, andscenario_context/plan-mode go after the cache breakpoints (test_context_does_not_invalidate_cache_breakpointsguards it). - SDK runner properly gated + lazy-imported;
ImportErrorhandled. requirements.txtcorrectly completed (python-dateutil) for thepip_install_from_requirementsswitch.
Verdict: the core is mergeable after (1) a careful rebase, (2) a green test run, and (3) splitting out the GA change. The gated SDK runner can stay as the experiment it's framed as.
vahid-ahmadi
left a comment
There was a problem hiding this comment.
pls see my comment
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Supersedes #42 — same content, moved to an origin branch so the PR beta deploy workflow runs.
Summary
GET /chat/backendsand a frontend Engine selector with backend package version display.run_pythonexecution environments behind the backend registry.POLICYENGINE_CHAT_AGENT_RUNNER=claude_sdk.run_pythonto the SDK runner through an in-process MCP tool while preserving the existing frontend SSE event contract.backend/requirements.txtto avoid deployment dependency drift.Notes
Rebased onto current main (4519811). Main's structural plan-mode enforcement and cached-reference cache breakpoints are preserved — backend prompt context is injected into the cached first block via a template placeholder, so toggling plan mode never invalidates the cache.
The existing direct Anthropic Messages loop remains the default unless
POLICYENGINE_CHAT_AGENT_RUNNER=claude_sdkis set. This lets us ship the backend selector while keeping the SDK runner gated.Validation
python3.13 -m py_compileon all touched Python filesManual test
Default runner:
`cd backend && set -a; source ../.env; set +a; uvicorn main:app --host 127.0.0.1 --port 8002`
SDK runner:
`cd backend && set -a; source ../.env; set +a; POLICYENGINE_CHAT_AGENT_RUNNER=claude_sdk uvicorn main:app --host 127.0.0.1 --port 8001`