Skip to content

Add model backend selector and Claude Agent SDK runner#51

Open
SakshiKekre wants to merge 6 commits into
mainfrom
feat/model-backend-selector
Open

Add model backend selector and Claude Agent SDK runner#51
SakshiKekre wants to merge 6 commits into
mainfrom
feat/model-backend-selector

Conversation

@SakshiKekre
Copy link
Copy Markdown
Collaborator

Supersedes #42 — same content, moved to an origin branch so the PR beta deploy workflow runs.

Summary

  • Adds a model backend abstraction with selectable UK compiled and UK Python backends.
  • Adds GET /chat/backends and a frontend Engine selector with backend package version display.
  • Keeps backend-specific prompts, tool descriptions, imports, and run_python execution environments behind the backend registry.
  • Adds an opt-in Claude Agent SDK runner behind POLICYENGINE_CHAT_AGENT_RUNNER=claude_sdk.
  • Exposes run_python to the SDK runner through an in-process MCP tool while preserving the existing frontend SSE event contract.
  • Makes Modal install backend dependencies from backend/requirements.txt to avoid deployment dependency drift.

Notes

Rebased onto current main (4519811). Main's structural plan-mode enforcement and cached-reference cache breakpoints are preserved — backend prompt context is injected into the cached first block via a template placeholder, so toggling plan mode never invalidates the cache.

The existing direct Anthropic Messages loop remains the default unless POLICYENGINE_CHAT_AGENT_RUNNER=claude_sdk is set. This lets us ship the backend selector while keeping the SDK runner gated.

Validation

  • python3.13 -m py_compile on all touched Python files
  • Net diff: +1023 / -146 across 10 files

Manual test

Default runner:
`cd backend && set -a; source ../.env; set +a; uvicorn main:app --host 127.0.0.1 --port 8002`

SDK runner:
`cd backend && set -a; source ../.env; set +a; POLICYENGINE_CHAT_AGENT_RUNNER=claude_sdk uvicorn main:app --host 127.0.0.1 --port 8001`

Backends: uk_compiled (Rust) and uk_python (Python), selected per
request or via POLICYENGINE_CHAT_BACKEND. Each controls its own prompt
context, tool description, and execution globals. ChatPage exposes a
toggle persisted in localStorage. /chat/backends lists what's available.

Rebased onto main: keeps main's structural plan-mode enforcement and
cached-reference cache breakpoints; backend prompt context is injected
into the cached first block via a template placeholder.

Claude Agent SDK runner is opt-in
(POLICYENGINE_CHAT_AGENT_RUNNER=claude_sdk), mirrors the existing SSE
event contract, exposes run_python via an in-process MCP tool.

Also: modal_app installs from backend/requirements.txt, pins
policyengine_uk==2.88.0, adds GA tracking.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
policyengine-uk-chat Ready Ready Preview, Comment May 27, 2026 3:41pm

Request Review

@github-actions
Copy link
Copy Markdown

Beta preview is ready.

The hostname-based slug parsing breaks when Vercel truncates long preview
hostnames (e.g. feat/model-backend-selector becomes
policyengine-uk-chat-git-feat-model-backen-4022d4-policy-engine.vercel.app,
yielding a Modal URL the deploy never created). Source the slug from the
git ref instead, matching the CI workflow's derivation. Expose the env
vars via next.config so they reach the client at build time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Vercel assigns multiple hostnames to the same preview deployment (the
full one and a truncated-with-hash variant). Listing both in HOSTNAMES
is brittle. Add a HOSTNAME_REGEX env var that overrides HOSTNAMES when
set, and use a Vercel-preview-shaped regex in pr-beta-deploy.yml.
Production deploys continue to use the explicit HOSTNAMES list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The uk_python backend reads HF datasets via policyengine-core's
hugging_face downloader, which expects HUGGING_FACE_TOKEN in the env.
Without it, microsim runs fail with 401 on any private dataset.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an optional scenario_context field to /chat/message and a matching
URL query-param read in ChatPage. Embedders (e.g. app-v2 opening the
chat in an iframe drawer next to a report) can seed the session with
the scenario the user is already viewing.

The context lives in its own system block *after* the cache breakpoints,
so it never invalidates the cached system prompt or reference doc — same
pattern used for plan mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@vahid-ahmadi vahid-ahmadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this — the model-backend registry is clean and well-structured. A few things to resolve before merge.

🔴 Blockers

1. Six-file merge conflict, and the chat_message one is dangerous.
This PR's merge-base is 4519811; main is now 9bcfbff with #47, #49 and #48 all merged. Conflicts in main.py, requirements.txt, chatbot.py, test_api.py, ChatPage.tsx, modal_app.py.

The risky one is chat_message. #48 (rate limiting) added @limiter.limit decorators and renamed the endpoint params to chat_message(request: Request, chat_request: ChatRequest). That rename is load-bearing — slowapi requires the parameter named request to be the Starlette Request, or the endpoint 500s on every call. This PR rewrites the same function but keeps the old chat_message(request: ChatRequest, http_request: Request) and adds new request.model_backend / request.scenario_context / http_request.is_disconnected() references.

When rebasing, please keep #48's decorators and signature, and rewrite all of this PR's new request.* body references to chat_request.* (and http_request. to request.). A naive "take ours" here silently reintroduces a total /chat/message outage.

2. Tests not run — only py_compile is claimed. For a +1,000-line PR reshaping _build_system_blocks, _select_chat_model, _tool_defs_for_anthropic, execute_tool and adding /chat/backends, please run pytest backend/tests/ and confirm green — especially the new TestModelBackends / TestChatBackends / TestScenarioContext.

🟠 Significant

3. Unannounced Google Analytics — please split it out. layout.tsx adds a GA tag (G-2YHG89FY0N) not mentioned anywhere in the PR description. For a UK-facing product, loading GA with no cookie-consent gate is a PECR/GDPR concern. This should be its own PR with a deliberate consent decision, not bundled into a backend-selector change.

4. The SDK runner can't run on the Modal image. claude-agent-sdk drives the Claude Code CLI as a subprocess and needs Node.js + the CLI; the Modal image (debian_slim + libpq-dev/gcc) has neither. It's gated off by default and the error is caught, so nothing breaks — but please state plainly that the runner is non-functional as-deployed until the image gains Node, so nobody flips the env var expecting it to work.

5. SDK runner is not at parity — no REFERENCE_DOC, no prompt caching. Enabling it silently degrades answer quality (the model loses the API reference). Fine as a gated experiment; just call it out.

🟡 Minor

  • scenario_context comes from a ?scenario_context= URL param and is injected into the system prompt — a crafted link can prompt-inject a victim's session. Limited blast radius (after the cache breakpoints, labelled FROM EMBEDDER), but please add a length cap so it can't bloat the prompt/cost.
  • execute_tool dispatches compute/generate_chart, but get_tool_definitions only advertises run_python — those two are unreachable by the model.
  • SDK runner records token-based cost_gbp while also reporting total_cost_usd — two cost numbers that can diverge.
  • SDK tool_result ordering (tool_result_queue vs tool_order) can mis-attribute a result if tools complete out of order.

✅ Good

  • Clean ModelBackend dataclass registry — per-backend prompt context, tool description, execution globals, import allowlist.
  • Prompt-cache discipline preserved — SYSTEM_PROMPT_TEMPLATE has only the single {backend_prompt_context} placeholder so .format() is safe, and scenario_context/plan-mode go after the cache breakpoints (test_context_does_not_invalidate_cache_breakpoints guards it).
  • SDK runner properly gated + lazy-imported; ImportError handled.
  • requirements.txt correctly completed (python-dateutil) for the pip_install_from_requirements switch.

Verdict: the core is mergeable after (1) a careful rebase, (2) a green test run, and (3) splitting out the GA change. The gated SDK runner can stay as the experiment it's framed as.

Copy link
Copy Markdown
Collaborator

@vahid-ahmadi vahid-ahmadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls see my comment

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants