Skip to content

feat(proxy): per-credential and global concurrency caps #210

@hashedone

Description

@hashedone

Sub-issue of #181 (parent), follow-up to merged PR #208.

Problem

The Anthropic proxy currently has no concurrency limits anywhere. Three concrete failure modes this enables:

  1. Leaked TV session token weaponized as DoS. A leaked token can be used to fire N simultaneous proxy requests. Blast radius hits the user's Anthropic bill, TraceVault's outbound bandwidth, the DB pool (every request borrows a slot for the auth_sessions lookup), and Tokio task slots.
  2. Runaway agent. An agent harness with a no-backoff retry loop can amplify a single failure into hundreds of in-flight requests holding resources.
  3. Multi-tenant unfairness. When a second org's user starts using the proxy, one user's burst can starve another via outbound bandwidth / DB pool contention.

Originally surfaced as "major #3" in the self-review of #208, where it was deferred for design clarity.

Design

The cap belongs to the upstream credential, not the requesting user or session. Anthropic itself rate-limits on the account associated with the API key — TraceVault's cap should mirror that ownership model. This generalizes cleanly to later org-level keys and multi-provider support without re-architecting.

Schema (migrations/025_*.sql)

ALTER TABLE user_anthropic_keys
    ADD COLUMN max_concurrent INTEGER NOT NULL DEFAULT 8
    CHECK (max_concurrent > 0 AND max_concurrent <= 256);

Default 8 — comfortable for typical multi-agent setups (Claude Code + GSD2), under Anthropic free tier (5 concurrent for free, but free tier is uncommon for active users), well under any paid tier. Upper bound 256 prevents user-typed nonsense values.

API

  • GET /api/v1/me/anthropic-key adds max_concurrent: i32 to the response.
  • PUT /api/v1/me/anthropic-key accepts an optional max_concurrent. Omitted = use DB default on first insert, keep existing value on update.

Web (/me/proxy/)

Numeric input next to the key field, saved together. Renders the current value alongside "Configured" status.

Proxy handler

Two semaphores acquired in order, both held for the request lifetime via RAII permits:

  1. Global semaphoreArc<Semaphore> on AppState. Size from PROXY_MAX_GLOBAL_CONCURRENT env var. Unset = unlimited (default). Operator turns it on after capacity testing. Recommended starting value in docs: 256.
  2. Per-credential semaphoreDashMap<Uuid, Arc<Semaphore>> on AppState, keyed by user_anthropic_keys.user_id. Lazily created on first request for a credential, sized to the credential's max_concurrent.

Acquisition order matters: global first, per-credential second. This way a misbehaving credential bumps into its own cap before contributing to global pressure, and the cleanup order on permit drop is the reverse (per-credential released first), which is exception-safe regardless.

When either acquire fails: return Anthropic-shaped 429 with error.type = "overloaded_error". Per-credential failure message: "Too many concurrent requests against this credential (cap: N). Retry shortly.". Global failure message: "Server is at capacity. Retry shortly.".

Update semantics

Lazy. The semaphore in the DashMap is created at first use with the cap value read from DB at that moment. A subsequent PUT updates the DB column but does not invalidate the in-memory semaphore. New cap applies after server restart or after the entry is otherwise dropped. This avoids the atomic-swap edge cases of mid-flight cap changes. Documented in code, called out in the PR description when the slice ships.

Eviction

The DashMap grows monotonically with active credentials. At expected scale (≤ ~10k users) this is a few MB at most. No eviction in v1; a one-line code comment notes the threshold at which to revisit.

Telemetry

Every reject logged via tracing::warn! with structured fields:

error_type = "overloaded_error"
reason = "per_credential_cap" | "global_cap"
user_id = <uuid>
cap_value = <i32>
path = <string>

This is the dataset needed later to distinguish "cap too low" from "someone is attacking."

Tests

  • proxy_rejects_when_per_credential_cap_exceeded — N concurrent succeed, N+1 returns 429 / overloaded_error.
  • proxy_frees_permit_when_request_completes — N+1 succeeds after one of the N finishes.
  • proxy_rejects_when_global_cap_exceeded — many credentials, hit the global cap, verify rejection with reason = "global_cap".
  • proxy_default_cap_applied_to_existing_rows — migration sets 8 for pre-existing rows.
  • me_anthropic_key_get_returns_max_concurrent — API roundtrip.
  • me_anthropic_key_put_accepts_max_concurrent — API roundtrip.
  • me_anthropic_key_put_rejects_out_of_rangemax_concurrent <= 0 or > 256 returns 400.

Out of scope (deferred)

  • Org-level Anthropic keys (cap mechanism will generalize; comes with the org-keys slice).
  • Routing between personal and org keys (comes before org-keys per feat: LLM proxy service — universal integration and org-level key management #181 plan).
  • Per-token sub-caps (waits for proxy_tokens table, which is itself deferred).
  • Per-IP or per-source-tool caps (different axis; not needed yet).
  • Adaptive caps based on observed upstream 429s.
  • Cap eviction / TTL (revisit if active credentials exceed ~10k).

Estimated effort

1 day, mostly tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions