feat(proxy): per-credential and global concurrency caps

Sub-issue of #181 (parent), follow-up to merged PR #208.

## Problem

The Anthropic proxy currently has no concurrency limits anywhere. Three concrete failure modes this enables:

1. **Leaked TV session token weaponized as DoS.** A leaked token can be used to fire N simultaneous proxy requests. Blast radius hits the user's Anthropic bill, TraceVault's outbound bandwidth, the DB pool (every request borrows a slot for the `auth_sessions` lookup), and Tokio task slots.
2. **Runaway agent.** An agent harness with a no-backoff retry loop can amplify a single failure into hundreds of in-flight requests holding resources.
3. **Multi-tenant unfairness.** When a second org's user starts using the proxy, one user's burst can starve another via outbound bandwidth / DB pool contention.

Originally surfaced as "major #3" in the self-review of #208, where it was deferred for design clarity.

## Design

The cap belongs to the **upstream credential**, not the requesting user or session. Anthropic itself rate-limits on the account associated with the API key — TraceVault's cap should mirror that ownership model. This generalizes cleanly to later org-level keys and multi-provider support without re-architecting.

### Schema (`migrations/025_*.sql`)

```sql
ALTER TABLE user_anthropic_keys
    ADD COLUMN max_concurrent INTEGER NOT NULL DEFAULT 8
    CHECK (max_concurrent > 0 AND max_concurrent <= 256);
```

Default 8 — comfortable for typical multi-agent setups (Claude Code + GSD2), under Anthropic free tier (5 concurrent for free, but free tier is uncommon for active users), well under any paid tier. Upper bound 256 prevents user-typed nonsense values.

### API

- `GET /api/v1/me/anthropic-key` adds `max_concurrent: i32` to the response.
- `PUT /api/v1/me/anthropic-key` accepts an optional `max_concurrent`. Omitted = use DB default on first insert, keep existing value on update.

### Web (`/me/proxy/`)

Numeric input next to the key field, saved together. Renders the current value alongside "Configured" status.

### Proxy handler

Two semaphores acquired in order, both held for the request lifetime via RAII permits:

1. **Global semaphore** — `Arc<Semaphore>` on `AppState`. Size from `PROXY_MAX_GLOBAL_CONCURRENT` env var. **Unset = unlimited** (default). Operator turns it on after capacity testing. Recommended starting value in docs: 256.
2. **Per-credential semaphore** — `DashMap<Uuid, Arc<Semaphore>>` on `AppState`, keyed by `user_anthropic_keys.user_id`. Lazily created on first request for a credential, sized to the credential's `max_concurrent`.

Acquisition order matters: global first, per-credential second. This way a misbehaving credential bumps into its own cap before contributing to global pressure, and the cleanup order on permit drop is the reverse (per-credential released first), which is exception-safe regardless.

When either acquire fails: return Anthropic-shaped 429 with `error.type = "overloaded_error"`. Per-credential failure message: `"Too many concurrent requests against this credential (cap: N). Retry shortly."`. Global failure message: `"Server is at capacity. Retry shortly."`.

### Update semantics

Lazy. The semaphore in the DashMap is created at first use with the cap value read from DB at that moment. A subsequent PUT updates the DB column but does **not** invalidate the in-memory semaphore. New cap applies after server restart or after the entry is otherwise dropped. This avoids the atomic-swap edge cases of mid-flight cap changes. Documented in code, called out in the PR description when the slice ships.

### Eviction

The DashMap grows monotonically with active credentials. At expected scale (≤ ~10k users) this is a few MB at most. No eviction in v1; a one-line code comment notes the threshold at which to revisit.

### Telemetry

Every reject logged via `tracing::warn!` with structured fields:

```
error_type = "overloaded_error"
reason = "per_credential_cap" | "global_cap"
user_id = <uuid>
cap_value = <i32>
path = <string>
```

This is the dataset needed later to distinguish "cap too low" from "someone is attacking."

## Tests

- `proxy_rejects_when_per_credential_cap_exceeded` — N concurrent succeed, N+1 returns 429 / `overloaded_error`.
- `proxy_frees_permit_when_request_completes` — N+1 succeeds after one of the N finishes.
- `proxy_rejects_when_global_cap_exceeded` — many credentials, hit the global cap, verify rejection with `reason = "global_cap"`.
- `proxy_default_cap_applied_to_existing_rows` — migration sets 8 for pre-existing rows.
- `me_anthropic_key_get_returns_max_concurrent` — API roundtrip.
- `me_anthropic_key_put_accepts_max_concurrent` — API roundtrip.
- `me_anthropic_key_put_rejects_out_of_range` — `max_concurrent <= 0` or `> 256` returns 400.

## Out of scope (deferred)

- Org-level Anthropic keys (cap mechanism will generalize; comes with the org-keys slice).
- Routing between personal and org keys (comes before org-keys per #181 plan).
- Per-token sub-caps (waits for `proxy_tokens` table, which is itself deferred).
- Per-IP or per-source-tool caps (different axis; not needed yet).
- Adaptive caps based on observed upstream 429s.
- Cap eviction / TTL (revisit if active credentials exceed ~10k).

## Estimated effort

1 day, mostly tests.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(proxy): per-credential and global concurrency caps #210

Problem

Design

Schema (`migrations/025_*.sql`)

API

Web (`/me/proxy/`)

Proxy handler

Update semantics

Eviction

Telemetry

Tests

Out of scope (deferred)

Estimated effort

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat(proxy): per-credential and global concurrency caps #210

Description

Problem

Design

Schema (migrations/025_*.sql)

API

Web (/me/proxy/)

Proxy handler

Update semantics

Eviction

Telemetry

Tests

Out of scope (deferred)

Estimated effort

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Schema (`migrations/025_*.sql`)

Web (`/me/proxy/`)