feat(proxy): transparent Anthropic proxy — first slice of #207#208
Conversation
| /// | ||
| /// Path layout: `path` is everything after `/proxy/anthropic/` (no leading | ||
| /// slash). Query string is forwarded verbatim from the original URI. | ||
| pub async fn anthropic_proxy( |
There was a problem hiding this comment.
anthropic_proxy and its helpers mix auth (token resolution), DB access, decryption, and HTTP proxying in one component; separate concerns (auth resolution, credential loading/decrypting, and request forwarding) to reduce coupling.
Details
✨ AI Reasoning
The new proxy handler and its helpers combine several distinct responsibilities: authenticating and resolving session tokens, querying the database for user keys, decrypting secrets with the server master key, constructing and forwarding HTTP requests, shaping Anthropic-style error envelopes, and streaming responses back to clients. This bundles persistence, crypto, auth, and networking logic into a single component, increasing coupling and making testing and future changes harder. Isolating these concerns into smaller modules (auth resolution, credential service, and a thin forwarding handler) would improve maintainability.
🔧 How do I fix it?
Split classes that handle database, HTTP, and UI concerns into separate, focused classes.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
There was a problem hiding this comment.
Done in f358647 — extracted authenticate, forward_to_upstream, and build_downstream_response so the handler is a thin orchestration shell over the three concerns. All proxy_integration tests still pass.
| } catch { | ||
| // Clipboard API can fail in some browsers / contexts; ignore silently. |
There was a problem hiding this comment.
The catch block in copyProxyUrl silently swallows clipboard errors. Add logging or user-visible feedback instead of silently ignoring failures.
Show fix
| } catch { | |
| // Clipboard API can fail in some browsers / contexts; ignore silently. | |
| } catch (err) { | |
| // Clipboard API can fail in some browsers / contexts; log and continue. | |
| console.warn('Failed to copy proxy URL to clipboard:', err); |
Details
✨ AI Reasoning
A try/catch block was added that swallows errors silently in copyProxyUrl when the Clipboard API write fails. Silent catches make debugging and monitoring harder because they suppress failure signals; even with an explanatory comment, the error is not logged or surfaced. The handler contains no logging, fallback, or user-visible feedback.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
| const state = await new Promise<{ current: OrgInfo | null }>((resolve) => { | ||
| const unsub = orgStore.subscribe((s) => { | ||
| resolve(s); | ||
| unsub(); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
unsub() is called inside the subscribe callback before unsub is guaranteed initialized; immediate subscriber invocation can throw at runtime and break loadIfNeeded().
Show fix
| const state = await new Promise<{ current: OrgInfo | null }>((resolve) => { | |
| const unsub = orgStore.subscribe((s) => { | |
| resolve(s); | |
| unsub(); | |
| }); | |
| }); | |
| let unsub: (() => void) | undefined; | |
| const state = await new Promise<{ current: OrgInfo | null }>((resolve) => { | |
| unsub = orgStore.subscribe((s) => { | |
| resolve(s); | |
| }); | |
| }); | |
| unsub?.(); |
Details
✨ AI Reasoning
The code creates a Promise from orgStore.subscribe and calls the unsubscribe function inside the subscriber callback. In store implementations that invoke the subscriber immediately, the callback runs before the unsubscribe variable has been initialized. That makes the callback access an uninitialized binding, so the logic intended to resolve once and unsubscribe can fail at runtime instead of completing normally.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
Add user_anthropic_keys table (user_id PK, key_encrypted, key_nonce, created_at, updated_at) and UserAnthropicKeyRepo with upsert / get_ciphertext / configured_at / delete. Keys are encrypted at rest via the existing AES-256-GCM encryption.rs path (same master key as org signing keys / SSO client secrets); configured_at lets the status API report key presence without ever touching ciphertext. Refs #207, parent #181.
GET returns { configured, configured_at } — never the key itself.
PUT validates sk-ant- prefix, requires AppState.encryption_key to be
set, and upserts via UserAnthropicKeyRepo. DELETE is idempotent. All
three reject AuthUser with nil user_id (org-scoped api_keys) with a
clear 403 — the proxy is fundamentally per-user.
Refs #207, parent #181.
Catch-all at /proxy/anthropic/{*path} forwarding to api.anthropic.com.
Authenticates via x-api-key (TV session token resolved against
auth_sessions; org api_keys explicitly rejected with a clear 401);
loads + decrypts the user's stored Anthropic key; rewrites x-api-key
with the upstream value; streams the response body byte-for-byte via
reqwest bytes_stream() — no SSE parsing, no buffering. Allow-list
header forwarding (drops host/cookie/authorization/x-forwarded-*/
set-cookie); response headers forward a fixed allow-list plus any
anthropic-* prefix for forward compat. Proxy-originated errors use
the Anthropic envelope shape so unmodified clients route them through
their existing error paths. Mounted on its own Router so the public
GovernorLayer rate-limit does not throttle agent sessions.
Refs #207, parent #181.
11 tests covering the full proxy lifecycle: non-streaming JSON passthrough (with header-swap assertion proving the TV token is replaced by the upstream Anthropic key before reaching the wire), SSE byte-for-byte streaming, upstream 4xx/5xx verbatim passthrough, upstream-unreachable -> 502 api_error, missing/invalid/org-key/ no-key-configured auth failures with the Anthropic-shaped envelope, forbidden-header stripping verified via MockServer.received_requests(), and the deferred /me/anthropic-key HTTP lifecycle (empty -> PUT-bad -> PUT-good -> GET-configured -> DELETE -> GET-empty). Adds AppState.anthropic_upstream_base so tests can redirect to wiremock without env-var mutation, plus seed_auth_session and fixture_encryption_key helpers in tests/common. Refs #207, parent #181.
Two tests against api.anthropic.com that verify byte-fidelity in the
real-network case wiremock cannot model (TLS, HTTP/2, real SSE event
vocabulary). Both are #[ignore]-d so they do not run in CI. Run with:
ANTHROPIC_API_KEY=sk-ant-... \
cargo test -p tracevault-server --test proxy_real_anthropic \
-- --ignored --nocapture
Non-streaming asserts type=message + non-empty content[0].text +
stop_reason. Streaming asserts the body contains message_start and
content_block_delta SSE event markers.
Refs #207, parent #181.
dotenvy::dotenv() walks up from the test CWD to the workspace root and loads any .env present, so users can put ANTHROPIC_API_KEY in .env instead of exporting it per-shell. Note: DATABASE_URL still has to be set externally because sqlx::test constructs the pool before the test body runs. Refs #207, parent #181.
resolve_claude_target() in init.rs branches on io::stdin().is_terminal(): non-TTY -> Shared default, TTY -> interactive prompt. Tests passing claude_settings=None were taking the interactive branch when run from a real terminal, which raced on stdin under parallel execution and caused init_installs_claude_hooks / init_merges_into_existing_settings / init_no_gitignore_skips_gitignore_update to fail intermittently depending on test scheduling. Pass Some(ClaudeSettingsTarget::Shared) explicitly from every test that previously passed None. This pins behavior independent of the runner's stdin shape and matches what the tests actually want to verify. Follow-up: init_in_directory itself should take an interactive: bool arg so production callers can't accidentally fall into the same trap. File a separate issue. Refs #207, parent #181.
/me/proxy/ was rendering without any chrome because the root layout has no sidebar — only orgs/[slug]/+layout.svelte wires up AppLayout. Adds web/src/routes/me/+layout.svelte that loads the user's org list, picks the first org as current if none is set, and renders AppLayout around the children. The sidebar's links derive their slug from the org store, so /me/* now has a usable nav back to org-scoped pages. Refs #207, parent #181.
* me/+layout.svelte: svelte/store calls the subscriber synchronously on .subscribe(), so the previous in-callback unsub() was running while `unsub` was still in its TDZ. Stash the unsubscribe in an outer binding and call it after the Promise resolves. * me/proxy/+page.svelte: log clipboard-write failures via console.warn instead of swallowing silently — easier to debug, still no page-level error since the user can copy manually. Refs #207, parent #181.
Per PR review: extract `authenticate`, `forward_to_upstream`, and `build_downstream_response` so the handler is a thin orchestration shell and each concern (auth+credential loading, upstream request construction, downstream response streaming) lives in its own private function. Behavior is identical — all 11 proxy_integration tests still pass. Refs #207, parent #181.
f358647 to
40cb68a
Compare
* Reject path segments equal to `..` at the proxy entry with an Anthropic-shaped 400. Today reqwest/url normalize `..` so no cross-host escape is possible, but a future regional Anthropic base URL with a path prefix could be escaped — guard at the entry point. * Replace round-tripped `HeaderName::from_bytes` / `HeaderValue::from_bytes` in copy_response_headers with a direct `name.clone() / value.clone()` — the upstream-validated headers don't need re-validation, and the silent-drop branch on validation failure was a confusing prod debug story waiting to happen. * Drop the redundant `to_ascii_lowercase()` in the anthropic-* prefix check (http::HeaderName names are lowercase by construction). * New integration test: `proxy_rejects_path_traversal_segments`. Self-review minor #1 + minor #3 + nit #1 on PR #208.
Two related production-hardening fixes for the Anthropic proxy router: 1. `DefaultBodyLimit::max(32 * 1024 * 1024)` on the proxy router. Axum's default `Bytes` cap of 2 MB silently rejects legitimate Anthropic requests (vision inputs, long conversations, large `system` prompts) with a 413 that does NOT match the Anthropic error envelope, breaking the transparent-proxy contract. 32 MB matches the Anthropic request envelope while still bounding worst-case in-flight memory. 2. Dedicated `proxy_http_client` on AppState with `connect_timeout: 10s`. The shared `reqwest::Client::new()` used by pricing sync has no timeout configured; using it for the proxy meant a stalled TCP/TLS handshake on api.anthropic.com would park the proxy task indefinitely, holding a DB connection and a router slot — exhausting pools under partial Anthropic outage. Intentionally no overall `timeout()` because the proxy carries open-ended SSE streams. Self-review major #1 + major #2 on PR #208.
After rebasing onto main (v0.16.0), Cargo.lock fell behind the workspace's dependency graph. `cargo check` regenerated it; commit the regeneration so CI's `cargo check --locked` does not fail on a stale lockfile.
| // regardless of how the base URL is configured later. | ||
| if path.split(['/', '\\']).any(|seg| seg == "..") { | ||
| tracing::warn!( | ||
| error_type = "authentication_error", |
There was a problem hiding this comment.
Path-traversal rejection logs error_type = "authentication_error" but returns ProxyErrorKind::ApiError; this contradictory classification can mislabel incidents and break error-type-based monitoring.
| error_type = "authentication_error", | |
| error_type = "api_error", |
Details
✨ AI Reasoning
This branch is handling invalid path traversal input, not authentication. The emitted classification says one error type while the returned envelope uses another. That mismatch makes telemetry and downstream debugging inconsistent for the exact same failure path.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
…tion_error The log line said `error_type = "authentication_error"` while the response envelope used `ProxyErrorKind::ApiError`. Path traversal is not an authentication failure; align the log to match the envelope so error-type-based telemetry isn't split across two labels for the same failure. Addresses Aikido review comment on PR #208.
* Reject path segments equal to `..` at the proxy entry with an Anthropic-shaped 400. Today reqwest/url normalize `..` so no cross-host escape is possible, but a future regional Anthropic base URL with a path prefix could be escaped — guard at the entry point. * Replace round-tripped `HeaderName::from_bytes` / `HeaderValue::from_bytes` in copy_response_headers with a direct `name.clone() / value.clone()` — the upstream-validated headers don't need re-validation, and the silent-drop branch on validation failure was a confusing prod debug story waiting to happen. * Drop the redundant `to_ascii_lowercase()` in the anthropic-* prefix check (http::HeaderName names are lowercase by construction). * New integration test: `proxy_rejects_path_traversal_segments`. Self-review minor #1 + minor #3 + nit #1 on PR #208.
Two related production-hardening fixes for the Anthropic proxy router: 1. `DefaultBodyLimit::max(32 * 1024 * 1024)` on the proxy router. Axum's default `Bytes` cap of 2 MB silently rejects legitimate Anthropic requests (vision inputs, long conversations, large `system` prompts) with a 413 that does NOT match the Anthropic error envelope, breaking the transparent-proxy contract. 32 MB matches the Anthropic request envelope while still bounding worst-case in-flight memory. 2. Dedicated `proxy_http_client` on AppState with `connect_timeout: 10s`. The shared `reqwest::Client::new()` used by pricing sync has no timeout configured; using it for the proxy meant a stalled TCP/TLS handshake on api.anthropic.com would park the proxy task indefinitely, holding a DB connection and a router slot — exhausting pools under partial Anthropic outage. Intentionally no overall `timeout()` because the proxy carries open-ended SSE streams. Self-review major #1 + major #2 on PR #208.
First slice of #207 (which is a partial of #181): transparent Anthropic LLM proxy.
What this PR ships
024_user_anthropic_keys.sql): per-user encrypted Anthropic key, AES-256-GCM via the existingencryption.rspath (same master key as org signing keys / SSO client secrets)./api/v1/me/anthropic-key): GET returns{ configured, configured_at }— never the key itself. PUT validates thesk-ant-prefix, upserts. DELETE is idempotent. All three reject orgapi_keyswith a clear 403./proxy/anthropic/{*path}): catch-all forwarding tohttps://api.anthropic.com/{path}. Authenticates via TV session token inx-api-key, swaps in the user's stored Anthropic key, streams the response byte-for-byte viareqwest::Response::bytes_stream()— no SSE parsing, no buffering. Mounted on its own Router so the publicGovernorLayerrate-limit does not throttle agent sessions.accept/accept-encoding/anthropic-version/anthropic-beta/anthropic-dangerous-direct-browser-access/content-type/user-agent; dropshost/cookie/authorization/x-forwarded-*. Response forwards a fixed list plus anyanthropic-*for forward compat. Set-Cookie is explicitly tested to never leak./me/proxy/): SvelteKit page with status + save/replace/remove + copy-block.tracevault proxy infocommand prints proxy URL + env-var setup snippet.#[ignore]-d real-Anthropic tests for local pre-merge verification.Out of scope — explicitly deferred
auth_sessions; expiry causes silent 401s — documented limitation)Verification status
cargo fmt --all -- --checkcargo clippy --workspace --tests --all-targets -- -D warningscargo test --workspaceok, 2 ignored as designed)pnpm check(svelte-check)#[ignore]testsHow to run the deferred local checks
Parent issue: #207 (which is a partial of #181)
Target branch:
feature/llm-proxy(umbrella branch tracking #181)