Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion internal/handlers/openapi.go
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ const openAPISpec = `{
"post": {
"summary": "Provision a Postgres database",
"description": "Returns a real postgres:// connection string with pgvector pre-installed. Anonymous tier: 10MB, 2 connections, 24h TTL.\n\nSupports Stripe/AWS-style idempotency via the optional Idempotency-Key request header — see the parameter description below.",
"parameters": [{ "name": "Idempotency-Key", "in": "header", "required": false, "schema": { "type": "string", "maxLength": 255 }, "description": "Opaque client-supplied key (1-255 ASCII printable chars) that makes this POST safe to retry. The first response is cached for 24h; subsequent calls carrying the same key return the cached response verbatim with X-Idempotent-Replay: true. Reusing a key with a different body returns 409. Replays still consume rate-limit budget (anti-abuse) but do NOT consume quota budget (the original call already did)." }],
"parameters": [{ "name": "Idempotency-Key", "in": "header", "required": false, "schema": { "type": "string", "maxLength": 255 }, "description": "Opaque client-supplied key (1-255 ASCII printable chars) that makes this POST safe to retry. The first response is cached for 24h; subsequent calls carrying the same key return the cached response verbatim with X-Idempotent-Replay: true. Reusing a key with a different body returns 409. Replays do NOT consume rate-limit budget — the per-fingerprint daily counter is refunded on every cache hit so an agent retrying transient 5xx with the same key gets the documented replay (FINDING API-1, 2026-05-29). The FIRST call still pays the rate-limit cost; replays are refunded. The per-fingerprint provision-dedup cap (5 fresh resources/day, anti-abuse) is unchanged." }],
"requestBody": { "content": { "application/json": { "schema": { "$ref": "#/components/schemas/ProvisionRequest" } } } },
"responses": {
"201": { "description": "Database provisioned", "content": { "application/json": { "schema": { "$ref": "#/components/schemas/DBProvisionResponse" } } } },
Expand Down
21 changes: 21 additions & 0 deletions internal/metrics/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,27 @@ var (
Help: "Requests blocked by fingerprint rate limiting",
})

// IdempotencyReplayRefunded counts the rate-limit counter refunds the
// Idempotency middleware issues on a cache HIT — one increment per
// replayed response that successfully DECR'd the per-fingerprint
// daily counter (CLAUDE.md FINDING API-1, fix Option C).
//
// Labelled by route_path so on-call can see which endpoints absorb the
// most retry-storm traffic. A steady non-zero rate is healthy (agents
// are retrying transient 5xx and we're honoring the published Stripe-
// shape replay contract). A sudden spike on one route correlates with
// upstream brownouts; flip to NR and check the corresponding 5xx rate
// for the same route.
//
// Companion alert (infra repo): "idempotency replay refund spike (1h)"
// fires when rate(idempotency_replay_refunded_total[1h]) > 5×7d
// baseline — points the operator at a brownout in the underlying
// provisioner before agents start abandoning.
IdempotencyReplayRefunded = promauto.NewCounterVec(prometheus.CounterOpts{
Name: "instant_idempotency_replay_refunded_total",
Help: "Rate-limit counter refunds issued by Idempotency middleware on cache hit",
}, []string{"route"})

// RecycleGateBlocked counts anonymous provision attempts blocked by the
// free-tier recycle gate (Option B from FREE-TIER-RECYCLE-2026-05-12).
// Labelled by resource_type so we can see which services see the most
Expand Down
40 changes: 30 additions & 10 deletions internal/middleware/idempotency.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,16 +46,22 @@ import (
// Middleware ordering (see internal/router/router.go for the per-route
// wiring): RateLimit runs at app.Use scope (global, before OptionalAuth),
// so by the time this middleware runs the per-fingerprint daily counter
// has already incremented. THIS IS DELIBERATE: a malicious agent must NOT
// be able to bypass rate limiting via Idempotency-Key reuse, so replays
// still consume rate budget. The original-call cost is borne by the
// counter on the FIRST request; replays add an extra increment, which is
// the conservative choice — the customer paid for the first call (in
// quota terms) but a key-reuse attacker doesn't get free attempts.
// Quota-walls inside handlers (CheckAndIncrementToken) similarly continue
// to fire on replay paths, but the replay short-circuits BEFORE the
// handler so the quota counter is unaffected — the cached response simply
// goes out the wire. Net effect: rate-limit budget = abuse-protected;
// has already incremented. To honor the published Stripe-shape replay
// contract — same Idempotency-Key replays the cached response without
// burning a fresh rate-limit slot — every cache HIT path below calls
// RefundRateLimitCounter (single Redis DECR) BEFORE sending the cached
// response. The FIRST call still pays the cost (the original INCR), so
// an attacker reusing one key 100× gets amortised-cheaper attempts, NOT
// free attempts (FINDING API-1, CEO Option C, 2026-05-29). The handler-
// internal per-fingerprint provision-dedup cap (5/day, CLAUDE.md rule 6)
// is NOT touched by the refund — that abuse signal lives in handler
// code and is independent of the request-rate-limit counter.
//
// Quota-walls inside handlers (CheckAndIncrementToken) continue to fire
// on replay paths, but the replay short-circuits BEFORE the handler so
// the quota counter is unaffected — the cached response simply goes out
// the wire. Net effect: rate-limit budget = refunded on replay (Stripe
// contract); fingerprint provision-dedup = abuse-protected (unchanged);
// quota budget = customer-friendly (no double-charge for retries).
//
// Cache key shape: idem:<scope>:<endpoint>:<sha256(key)> where <scope> is
Expand Down Expand Up @@ -287,12 +293,20 @@ func idempotencyExplicit(c *fiber.Ctx, rdb *redis.Client, endpoint, scope, rawKe
"error", jerr, "endpoint", endpoint)
} else {
if entry.BodyHash != reqBodyHash {
// 409 is a genuine error response, not a replay — DO NOT
// refund the rate-limit counter here. The agent did the
// wrong thing (reused a key for a different body) and
// should still pay the cost of that mistake.
return c.Status(fiber.StatusConflict).JSON(fiber.Map{
"ok": false,
"error": "idempotency_key_conflict",
"message": "Idempotency-Key already used with a different body",
})
}
// Cache HIT — refund the rate-limit slot RateLimit burned on
// the way in (FINDING API-1, Option C). Fail-open: a refund
// error logs WARN but never blocks the cached response.
RefundRateLimitCounter(c, rdb)
c.Set(idempotencyReplayHeader, "true")
if entry.ContentType != "" {
c.Set(fiber.HeaderContentType, entry.ContentType)
Expand Down Expand Up @@ -397,6 +411,12 @@ func idempotencyFingerprint(c *fiber.Ctx, rdb *redis.Client, endpoint, scope str
"error", jerr, "endpoint", endpoint)
// Corrupt — fall through to handler and overwrite below.
} else {
// Cache HIT on the body-fingerprint fallback path. Same
// refund semantics as the explicit-key branch: the
// rate-limit slot RateLimit burned on the way in is
// returned because we're serving a cached response, not
// re-running the handler (FINDING API-1, Option C).
RefundRateLimitCounter(c, rdb)
c.Set(idempotencySourceHeader, idempotencySourceFingerprint)
c.Set(idempotencyReplayHeader, "true")
if entry.ContentType != "" {
Expand Down
Loading
Loading