You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sub-issue of #181 (parent), follow-up to merged PR #208.
Problem
The Anthropic proxy currently has no concurrency limits anywhere. Three concrete failure modes this enables:
Leaked TV session token weaponized as DoS. A leaked token can be used to fire N simultaneous proxy requests. Blast radius hits the user's Anthropic bill, TraceVault's outbound bandwidth, the DB pool (every request borrows a slot for the auth_sessions lookup), and Tokio task slots.
Runaway agent. An agent harness with a no-backoff retry loop can amplify a single failure into hundreds of in-flight requests holding resources.
Multi-tenant unfairness. When a second org's user starts using the proxy, one user's burst can starve another via outbound bandwidth / DB pool contention.
Originally surfaced as "major #3" in the self-review of #208, where it was deferred for design clarity.
Design
The cap belongs to the upstream credential, not the requesting user or session. Anthropic itself rate-limits on the account associated with the API key — TraceVault's cap should mirror that ownership model. This generalizes cleanly to later org-level keys and multi-provider support without re-architecting.
Default 8 — comfortable for typical multi-agent setups (Claude Code + GSD2), under Anthropic free tier (5 concurrent for free, but free tier is uncommon for active users), well under any paid tier. Upper bound 256 prevents user-typed nonsense values.
API
GET /api/v1/me/anthropic-key adds max_concurrent: i32 to the response.
PUT /api/v1/me/anthropic-key accepts an optional max_concurrent. Omitted = use DB default on first insert, keep existing value on update.
Web (/me/proxy/)
Numeric input next to the key field, saved together. Renders the current value alongside "Configured" status.
Proxy handler
Two semaphores acquired in order, both held for the request lifetime via RAII permits:
Global semaphore — Arc<Semaphore> on AppState. Size from PROXY_MAX_GLOBAL_CONCURRENT env var. Unset = unlimited (default). Operator turns it on after capacity testing. Recommended starting value in docs: 256.
Per-credential semaphore — DashMap<Uuid, Arc<Semaphore>> on AppState, keyed by user_anthropic_keys.user_id. Lazily created on first request for a credential, sized to the credential's max_concurrent.
Acquisition order matters: global first, per-credential second. This way a misbehaving credential bumps into its own cap before contributing to global pressure, and the cleanup order on permit drop is the reverse (per-credential released first), which is exception-safe regardless.
When either acquire fails: return Anthropic-shaped 429 with error.type = "overloaded_error". Per-credential failure message: "Too many concurrent requests against this credential (cap: N). Retry shortly.". Global failure message: "Server is at capacity. Retry shortly.".
Update semantics
Lazy. The semaphore in the DashMap is created at first use with the cap value read from DB at that moment. A subsequent PUT updates the DB column but does not invalidate the in-memory semaphore. New cap applies after server restart or after the entry is otherwise dropped. This avoids the atomic-swap edge cases of mid-flight cap changes. Documented in code, called out in the PR description when the slice ships.
Eviction
The DashMap grows monotonically with active credentials. At expected scale (≤ ~10k users) this is a few MB at most. No eviction in v1; a one-line code comment notes the threshold at which to revisit.
Telemetry
Every reject logged via tracing::warn! with structured fields:
Sub-issue of #181 (parent), follow-up to merged PR #208.
Problem
The Anthropic proxy currently has no concurrency limits anywhere. Three concrete failure modes this enables:
auth_sessionslookup), and Tokio task slots.Originally surfaced as "major #3" in the self-review of #208, where it was deferred for design clarity.
Design
The cap belongs to the upstream credential, not the requesting user or session. Anthropic itself rate-limits on the account associated with the API key — TraceVault's cap should mirror that ownership model. This generalizes cleanly to later org-level keys and multi-provider support without re-architecting.
Schema (
migrations/025_*.sql)Default 8 — comfortable for typical multi-agent setups (Claude Code + GSD2), under Anthropic free tier (5 concurrent for free, but free tier is uncommon for active users), well under any paid tier. Upper bound 256 prevents user-typed nonsense values.
API
GET /api/v1/me/anthropic-keyaddsmax_concurrent: i32to the response.PUT /api/v1/me/anthropic-keyaccepts an optionalmax_concurrent. Omitted = use DB default on first insert, keep existing value on update.Web (
/me/proxy/)Numeric input next to the key field, saved together. Renders the current value alongside "Configured" status.
Proxy handler
Two semaphores acquired in order, both held for the request lifetime via RAII permits:
Arc<Semaphore>onAppState. Size fromPROXY_MAX_GLOBAL_CONCURRENTenv var. Unset = unlimited (default). Operator turns it on after capacity testing. Recommended starting value in docs: 256.DashMap<Uuid, Arc<Semaphore>>onAppState, keyed byuser_anthropic_keys.user_id. Lazily created on first request for a credential, sized to the credential'smax_concurrent.Acquisition order matters: global first, per-credential second. This way a misbehaving credential bumps into its own cap before contributing to global pressure, and the cleanup order on permit drop is the reverse (per-credential released first), which is exception-safe regardless.
When either acquire fails: return Anthropic-shaped 429 with
error.type = "overloaded_error". Per-credential failure message:"Too many concurrent requests against this credential (cap: N). Retry shortly.". Global failure message:"Server is at capacity. Retry shortly.".Update semantics
Lazy. The semaphore in the DashMap is created at first use with the cap value read from DB at that moment. A subsequent PUT updates the DB column but does not invalidate the in-memory semaphore. New cap applies after server restart or after the entry is otherwise dropped. This avoids the atomic-swap edge cases of mid-flight cap changes. Documented in code, called out in the PR description when the slice ships.
Eviction
The DashMap grows monotonically with active credentials. At expected scale (≤ ~10k users) this is a few MB at most. No eviction in v1; a one-line code comment notes the threshold at which to revisit.
Telemetry
Every reject logged via
tracing::warn!with structured fields:This is the dataset needed later to distinguish "cap too low" from "someone is attacking."
Tests
proxy_rejects_when_per_credential_cap_exceeded— N concurrent succeed, N+1 returns 429 /overloaded_error.proxy_frees_permit_when_request_completes— N+1 succeeds after one of the N finishes.proxy_rejects_when_global_cap_exceeded— many credentials, hit the global cap, verify rejection withreason = "global_cap".proxy_default_cap_applied_to_existing_rows— migration sets 8 for pre-existing rows.me_anthropic_key_get_returns_max_concurrent— API roundtrip.me_anthropic_key_put_accepts_max_concurrent— API roundtrip.me_anthropic_key_put_rejects_out_of_range—max_concurrent <= 0or> 256returns 400.Out of scope (deferred)
proxy_tokenstable, which is itself deferred).Estimated effort
1 day, mostly tests.