Skip to content

fix(storage,recycle): canonical presign host + recycle gate consistency (API-3/4/7/8)#171

Merged
mastermanas805 merged 4 commits into
masterfrom
fix/storage-recycle-edge-gates
May 29, 2026
Merged

fix(storage,recycle): canonical presign host + recycle gate consistency (API-3/4/7/8)#171
mastermanas805 merged 4 commits into
masterfrom
fix/storage-recycle-edge-gates

Conversation

@mastermanas805

Copy link
Copy Markdown
Member

Closes four QA gaps from the 2026-05-29 round (/tmp/qa-session/shared/INBOX.md).

API-3 (P1) — /storage/:token/presign leaks DO Spaces host

The signed URL pointed at https://nyc3.digitaloceanspaces.com/... instead of the canonical https://s3.instanode.dev/... advertised by /storage/new. Every customer-facing artifact thus leaked both the DO vendor host AND the master access-key-id prefix (X-Amz-Credential=DO00CXYXRKNHR6XM77RE/...).

Fix: rewrite the scheme+host of the signed URL to ObjectStorePublicURL (production: https://s3.instanode.dev). s3.instanode.dev is a CNAME/CDN over the same DO Spaces backend (CLAUDE.md 'Canonical since 2026-05-20'), so the SigV4 signature stays valid at the canonical host. Path + entire query string preserved verbatim. Local-dev / MinIO with empty PublicURL passes through unchanged.

API-4 / CLI-MCP-15R2 (P1) — recycle gate inconsistent across services

The recycle gate fired on /db/new, /cache/new, /nosql/new, /webhook/new but NOT on /queue/new, /storage/new for the same fingerprint. Root cause (confirmed by CLI-MCP-15R2): models.GetAllActiveResourcesByFingerprint matched expired-but-status='active' rows the TTL reaper hadn't reaped yet — the gate's 'still mid-session' check saw stale rows and skipped.

Fix: add AND (expires_at IS NULL OR expires_at > NOW()) to the query so gate behaviour is reaper-independent. Regression-pinned via TestGetAllActiveResourcesByFingerprint_FiltersExpiredRows.

API-7 (P3) — daily-cap 429 pre-empts recycle-gate 402

For an over-cap fingerprint whose resources had aged out, the daily-cap 429 fired first and the agent lost the recycle-gate 402 + agent_action upsell signal.

Fix: in every anonymous-path handler (db, cache, nosql, queue, storage, webhook, vector), run recycleGate BEFORE checkProvisionLimit so the 402 wins when both gates would fire. Gate still fails open on Redis/DB errors so the magic-first-touch wedge is never collateral damage.

API-8 (P3) — invalid_operation agent_action drops HEAD

Error message lists GET, PUT, HEAD as accepted; agent_action only mentioned GET or PUT. Pure copy drift. Fixed.

Multi-surface checklist (memory rule 22)

Surface Touched? Why
api/plans.yaml No No tier values changed
common/plans/plans.go No Same
api/internal/handlers/openapi.go No Error envelopes unchanged
content/llms.txt No Customer contract for presign already advertises s3.instanode.dev — the leak was a bug, not a contract change
METRICS-CATALOG.md No No new metrics; existing RecycleGateBlocked + FingerprintAbuseBlocked already cover the reorder paths

Test coverage

  • TestRewritePresignHost_CanonicalHostSubstitution (5 subtests): rewrite, empty-publicURL pass-through, malformed publicURL, nil signed URL, scheme-fallback
  • TestPresign_CanonicalHostSubstitution: HTTP-level proof — signed URL is on s3.instanode.dev, NOT digitaloceanspaces.com; signature query preserved
  • TestPresign_NoCanonicalHost_PassesThrough: local-dev MinIO behaviour unchanged
  • TestGetAllActiveResourcesByFingerprint_FiltersExpiredRows: SQL fragment regression pin
  • Existing TestAnonRecycleGate_*, TestAnonCrossServiceCapFallback_*, TestAnonDedup_DecryptFailure_* continue to pass — the reorder doesn't change behaviour when active rows exist (gate returns false → falls through to limitExceeded path as before)

Four-pass deploy ritual (memory rule 23)

  • Local build + vet + new tests green (go build ./... clean, all targeted tests pass)
  • CI green (this PR)
  • Auto-deploy via deploy.yml on merge
  • Live verify: curl https://api.instanode.dev/healthz | jq .commit_id == merge SHA

QA evidence

  • /tmp/qa-session/API/storage_presign2.json — pre-fix presign showing nyc3.digitaloceanspaces.com
  • /tmp/qa-session/API/db_huge.json — pre-fix 429 envelope on an over-cap fingerprint
  • /tmp/qa-session/API/storage_presign.json — pre-fix invalid_operation agent_action drift

🤖 Generated with Claude Code

Manas Srivastava and others added 4 commits May 29, 2026 14:10
…cy (API-3/4/7/8)

Closes four QA gaps (2026-05-29 INBOX) in one pass:

API-3 (P1) — POST /storage/:token/presign returned signed URLs on
nyc3.digitaloceanspaces.com, leaking the DO Spaces vendor host AND the master
access-key-id prefix in every customer-facing artifact. Fix: rewrite the
scheme+host of the signed URL to ObjectStorePublicURL (production:
https://s3.instanode.dev — the canonical CNAME / CDN over the same DO Spaces
backend, per CLAUDE.md 'Canonical since 2026-05-20'). Path + signature query
are preserved verbatim so the SigV4 signature remains valid at the canonical
host. Local-dev / MinIO with empty PublicURL passes through unchanged.

API-4 / CLI-MCP-15R2 (P1) — recycle gate fired on db/cache/nosql/webhook but
NOT on queue/storage for the same fingerprint, because
models.GetAllActiveResourcesByFingerprint matched expired-but-status='active'
rows the TTL reaper hadn't reaped yet. The gate's 'still mid-session' check
saw a stale row and skipped. Fix: add 'AND (expires_at IS NULL OR
expires_at > NOW())' to the query so the gate behaviour is reaper-independent.

API-7 (P3) — daily-cap 429 pre-empted recycle-gate 402 + agent_action, losing
the upsell signal for an over-cap fingerprint whose resources had aged out.
Fix: in every anonymous-path handler (db/cache/nosql/queue/storage/webhook/
vector), run recycleGate BEFORE checkProvisionLimit so the 402 wins when both
gates would fire. Gate still fails open on Redis/DB errors so the
magic-first-touch wedge is never collateral damage.

API-8 (P3) — invalid_operation error message lists 'GET, PUT, HEAD' but
agent_action only mentioned 'GET or PUT'. Fix: agent_action now matches the
message exactly. Drift was a docs/error-copy regression; no behaviour change.

== Memory rule 22 — multi-surface checklist ==
- api/plans.yaml                — no change (tier values unchanged)
- common/plans/plans.go         — no change
- api/internal/handlers/openapi.go — no schema change (error envelopes unchanged)
- content/llms.txt              — no change (customer contract: response URL is
  still under s3.instanode.dev, which it ALREADY advertised; the leak was a bug
  not a contract change)
- METRICS-CATALOG.md            — no new metrics added; existing
  RecycleGateBlocked + FingerprintAbuseBlocked counters cover the reorder paths

== Memory rule 23 — four-pass gate ==
- Local gate: build + vet + new tests green
- Existing recycle-gate, cross-service-cap, dedup-decrypt regression tests
  unchanged (active rows still skip the gate, cap path still 429s when over
  the cap with a same-type or any-type live row)

== Test coverage ==
- TestRewritePresignHost_CanonicalHostSubstitution (5 subtests): rewrite,
  empty-publicURL pass-through, malformed publicURL, nil signed URL,
  scheme-fallback
- TestPresign_CanonicalHostSubstitution: HTTP-level proof — signed URL is
  on s3.instanode.dev, NOT digitaloceanspaces.com
- TestPresign_NoCanonicalHost_PassesThrough: local-dev MinIO unchanged
- TestGetAllActiveResourcesByFingerprint_FiltersExpiredRows: SQL fragment
  regression pin (test fails if the expires_at filter is ever removed)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses the patch-coverage gate failure on the API-7 reorder (PR-1 round 1).
Previously, only TestAnonRecycleGate_{Cache,NoSQL,Queue} covered the recycle-
gate-fired branch — and that test exercised the OLD (post-limitExceeded) gate
position. After the reorder, the new `if h.recycleGate { return nil }` lines
in db.go, storage.go, vector.go, webhook.go were uncovered.

This adds:

- TestAnonRecycleGate_DB (in anon_paths_provarms_test.go): reuses the
  gRPC fixture's /db/new + /cache/new mounts. Probes via /cache/new (always
  available), soft-deletes the row, plants the recycle marker, then asserts
  /db/new 402s.

- TestRecycleGate_EarlyFire_{Storage,Webhook,Vector} (recycle_gate_early_test.go):
  mounts each handler against a minimal fiber app + cache probe handler in
  the same app. Soft-delete + marker pattern → 402 free_tier_recycle_requires_claim.

All four tests deterministically hit the new line: `if h.recycleGate(c, fp, X) { return nil }`.
They use the same /cache/new probe technique already proven by
TestAnonProvision_RecycleGate_Returns402 to avoid depending on a live
postgres-customers/MongoDB/NATS/Spaces backend (the gate fires BEFORE any
provider dispatch).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Diff-cover round 2 flagged storage_presign.go lines 352-353 (the
`if scheme == "" { scheme = signed.Scheme }` branch) as untested. The prior
subtest passed "http://..." which set Scheme to "http", missing the
fallback. New subtest uses a protocol-relative "//cdn.instanode.dev" so
url.Parse returns Scheme=="" and the branch fires. signed URL was https,
so the rewritten output must inherit https — pinned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mastermanas805 mastermanas805 merged commit f2fb140 into master May 29, 2026
14 checks passed
@mastermanas805 mastermanas805 deleted the fix/storage-recycle-edge-gates branch May 29, 2026 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant