Add model backend selector and Claude Agent SDK runner by SakshiKekre · Pull Request #51 · PolicyEngine/policyengine-uk-chat

SakshiKekre · 2026-05-13T07:36:50Z

Supersedes #42 — same content, moved to an origin branch so the PR beta deploy workflow runs.

Summary

Adds a model backend abstraction with selectable UK compiled and UK Python backends.
Adds GET /chat/backends and a frontend Engine selector with backend package version display.
Keeps backend-specific prompts, tool descriptions, imports, and run_python execution environments behind the backend registry.
Adds an opt-in Claude Agent SDK runner behind POLICYENGINE_CHAT_AGENT_RUNNER=claude_sdk.
Exposes run_python to the SDK runner through an in-process MCP tool while preserving the existing frontend SSE event contract.
Makes Modal install backend dependencies from backend/requirements.txt to avoid deployment dependency drift.

Notes

Rebased onto current main (4519811). Main's structural plan-mode enforcement and cached-reference cache breakpoints are preserved — backend prompt context is injected into the cached first block via a template placeholder, so toggling plan mode never invalidates the cache.

The existing direct Anthropic Messages loop remains the default unless POLICYENGINE_CHAT_AGENT_RUNNER=claude_sdk is set. This lets us ship the backend selector while keeping the SDK runner gated.

Validation

python3.13 -m py_compile on all touched Python files
Net diff: +1023 / -146 across 10 files

Manual test

Default runner:
`cd backend && set -a; source ../.env; set +a; uvicorn main:app --host 127.0.0.1 --port 8002`

SDK runner:
`cd backend && set -a; source ../.env; set +a; POLICYENGINE_CHAT_AGENT_RUNNER=claude_sdk uvicorn main:app --host 127.0.0.1 --port 8001`

Backends: uk_compiled (Rust) and uk_python (Python), selected per request or via POLICYENGINE_CHAT_BACKEND. Each controls its own prompt context, tool description, and execution globals. ChatPage exposes a toggle persisted in localStorage. /chat/backends lists what's available. Rebased onto main: keeps main's structural plan-mode enforcement and cached-reference cache breakpoints; backend prompt context is injected into the cached first block via a template placeholder. Claude Agent SDK runner is opt-in (POLICYENGINE_CHAT_AGENT_RUNNER=claude_sdk), mirrors the existing SSE event contract, exposes run_python via an in-process MCP tool. Also: modal_app installs from backend/requirements.txt, pins policyengine_uk==2.88.0, adds GA tracking. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-05-13T07:36:55Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
policyengine-uk-chat	Ready	Preview, Comment	May 27, 2026 3:41pm

github-actions · 2026-05-13T07:40:11Z

Beta preview is ready.

Frontend: open preview
Backend: open backend

The hostname-based slug parsing breaks when Vercel truncates long preview hostnames (e.g. feat/model-backend-selector becomes policyengine-uk-chat-git-feat-model-backen-4022d4-policy-engine.vercel.app, yielding a Modal URL the deploy never created). Source the slug from the git ref instead, matching the CI workflow's derivation. Expose the env vars via next.config so they reach the client at build time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Vercel assigns multiple hostnames to the same preview deployment (the full one and a truncated-with-hash variant). Listing both in HOSTNAMES is brittle. Add a HOSTNAME_REGEX env var that overrides HOSTNAMES when set, and use a Vercel-preview-shaped regex in pr-beta-deploy.yml. Production deploys continue to use the explicit HOSTNAMES list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The uk_python backend reads HF datasets via policyengine-core's hugging_face downloader, which expects HUGGING_FACE_TOKEN in the env. Without it, microsim runs fail with 401 on any private dataset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds an optional scenario_context field to /chat/message and a matching URL query-param read in ChatPage. Embedders (e.g. app-v2 opening the chat in an iframe drawer next to a report) can seed the session with the scenario the user is already viewing. The context lives in its own system block *after* the cache breakpoints, so it never invalidates the cached system prompt or reference doc — same pattern used for plan mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vahid-ahmadi

Thanks for this — the model-backend registry is clean and well-structured. A few things to resolve before merge.

🔴 Blockers

1. Six-file merge conflict, and the chat_message one is dangerous.
This PR's merge-base is 4519811; main is now 9bcfbff with #47, #49 and #48 all merged. Conflicts in main.py, requirements.txt, chatbot.py, test_api.py, ChatPage.tsx, modal_app.py.

The risky one is chat_message. #48 (rate limiting) added @limiter.limit decorators and renamed the endpoint params to chat_message(request: Request, chat_request: ChatRequest). That rename is load-bearing — slowapi requires the parameter named request to be the Starlette Request, or the endpoint 500s on every call. This PR rewrites the same function but keeps the old chat_message(request: ChatRequest, http_request: Request) and adds new request.model_backend / request.scenario_context / http_request.is_disconnected() references.

When rebasing, please keep #48's decorators and signature, and rewrite all of this PR's new request.* body references to chat_request.* (and http_request. to request.). A naive "take ours" here silently reintroduces a total /chat/message outage.

2. Tests not run — only py_compile is claimed. For a +1,000-line PR reshaping _build_system_blocks, _select_chat_model, _tool_defs_for_anthropic, execute_tool and adding /chat/backends, please run pytest backend/tests/ and confirm green — especially the new TestModelBackends / TestChatBackends / TestScenarioContext.

🟠 Significant

3. Unannounced Google Analytics — please split it out. layout.tsx adds a GA tag (G-2YHG89FY0N) not mentioned anywhere in the PR description. For a UK-facing product, loading GA with no cookie-consent gate is a PECR/GDPR concern. This should be its own PR with a deliberate consent decision, not bundled into a backend-selector change.

4. The SDK runner can't run on the Modal image. claude-agent-sdk drives the Claude Code CLI as a subprocess and needs Node.js + the CLI; the Modal image (debian_slim + libpq-dev/gcc) has neither. It's gated off by default and the error is caught, so nothing breaks — but please state plainly that the runner is non-functional as-deployed until the image gains Node, so nobody flips the env var expecting it to work.

5. SDK runner is not at parity — no REFERENCE_DOC, no prompt caching. Enabling it silently degrades answer quality (the model loses the API reference). Fine as a gated experiment; just call it out.

🟡 Minor

scenario_context comes from a ?scenario_context= URL param and is injected into the system prompt — a crafted link can prompt-inject a victim's session. Limited blast radius (after the cache breakpoints, labelled FROM EMBEDDER), but please add a length cap so it can't bloat the prompt/cost.
execute_tool dispatches compute/generate_chart, but get_tool_definitions only advertises run_python — those two are unreachable by the model.
SDK runner records token-based cost_gbp while also reporting total_cost_usd — two cost numbers that can diverge.
SDK tool_result ordering (tool_result_queue vs tool_order) can mis-attribute a result if tools complete out of order.

✅ Good

Clean ModelBackend dataclass registry — per-backend prompt context, tool description, execution globals, import allowlist.
Prompt-cache discipline preserved — SYSTEM_PROMPT_TEMPLATE has only the single {backend_prompt_context} placeholder so .format() is safe, and scenario_context/plan-mode go after the cache breakpoints (test_context_does_not_invalidate_cache_breakpoints guards it).
SDK runner properly gated + lazy-imported; ImportError handled.
requirements.txt correctly completed (python-dateutil) for the pip_install_from_requirements switch.

Verdict: the core is mergeable after (1) a careful rebase, (2) a green test run, and (3) splitting out the GA change. The gated SDK runner can stay as the experiment it's framed as.

vahid-ahmadi

pls see my comment

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

SakshiKekre mentioned this pull request May 13, 2026

Add model backend selector and Claude Agent SDK runner #42

Closed

vercel Bot deployed to Preview May 13, 2026 07:37 View deployment

vercel Bot deployed to Preview May 13, 2026 07:52 View deployment

vercel Bot deployed to Preview May 13, 2026 08:03 View deployment

vercel Bot deployed to Preview May 13, 2026 08:27 View deployment

vercel Bot deployed to Preview May 13, 2026 08:56 View deployment

SakshiKekre mentioned this pull request May 13, 2026

Add UK chat integration (drawer on reports + standalone page) PolicyEngine/policyengine-app-v2#1036

Open

5 tasks

vahid-ahmadi requested changes May 22, 2026

View reviewed changes

SakshiKekre mentioned this pull request May 27, 2026

WIP: US Python backend (latency spike — do not merge) #54

Open

5 tasks

Ignore .datasets/ and *.tsbuildinfo

de36e84

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel Bot deployed to Preview May 27, 2026 15:41 View deployment

This was referenced May 27, 2026

Register the three dormant typed tools (UK) #55

Draft

Add eval harness scaffold: spec, scenarios, fixtures dir #52

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add model backend selector and Claude Agent SDK runner#51

Add model backend selector and Claude Agent SDK runner#51
SakshiKekre wants to merge 6 commits into
mainfrom
feat/model-backend-selector

SakshiKekre commented May 13, 2026

Uh oh!

vercel Bot commented May 13, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

vahid-ahmadi left a comment

Uh oh!

vahid-ahmadi left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SakshiKekre commented May 13, 2026

Summary

Notes

Validation

Manual test

Uh oh!

vercel Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

vahid-ahmadi left a comment

Choose a reason for hiding this comment

🔴 Blockers

🟠 Significant

🟡 Minor

✅ Good

Uh oh!

vahid-ahmadi left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented May 13, 2026 •

edited

Loading