fix(stack): harden MergeStackEnvVars — in-tx deleting guard, lock_timeout, honest keys_set (panel #7)#238
Merged
mastermanas805 merged 1 commit intoJun 4, 2026
Conversation
…eout, honest keys_set (panel #7) Design-review follow-up to #235. Three hardenings to the atomic env merge: 1. In-tx teardown guard (closes a TOCTOU). The "stack is mid-teardown" check was a pre-read in the handler (stack.Status == deleting before MergeStackEnvVars) — racy: the teardown worker can flip status between GetStackBySlug and the merge, committing an env change onto a deleting stack. The SELECT now fetches status under the FOR UPDATE lock and returns models.ErrStackDeleting (→ 409 stack_deleting); the handler pre-read is removed so the in-tx check is authoritative. Same posture as the resource-DELETE / SetTTL hardenings this session. 2. SET LOCAL lock_timeout = '3s' as the first tx statement, so a PATCH that races a long-held row lock fails fast to a retryable 503 instead of hanging the request goroutine indefinitely. 3. keys_set audit count now counts only actual upserts (v != ""). The old `len(body.Env) - deletes` over-counted a no-op delete (empty value for an absent key: not a delete, not a set), making the rule-12 audit lie. Tests: TestMergeStackEnvVars_Branches rewritten for the new SQL (SET LOCAL exec + status,env_vars SELECT) and adds the lock_timeout-error and status='deleting'→ErrStackDeleting cases — MergeStackEnvVars stays 100%. The existing real-DB deleting handler tests now exercise the in-tx path deterministically. The two stack-env fault tests' failAfter bumped +1 for the added in-tx SET LOCAL exec (which the fault harness counts). Verified vs real Postgres+Redis; all changed lines covered. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What (design-review follow-up to #235)
Three hardenings to the atomic env merge:
stack.Status == deletingbefore the merge) — racy: the teardown worker can flip status betweenGetStackBySlugand the merge, committing an env change onto a deleting stack. TheSELECTnow fetchesstatusunder the FOR UPDATE lock and returnsmodels.ErrStackDeleting(→ 409stack_deleting); the handler pre-read is removed so the in-tx check is authoritative. Same posture as the resource-DELETE / SetTTL hardenings.SET LOCAL lock_timeout = '3s'as the first tx statement — a PATCH racing a long-held row lock fails fast to a retryable 503 instead of hanging the request goroutine.keys_setcounts only actual upserts (v != ""). The oldlen(body.Env) - deletesover-counted a no-op delete (empty value for an absent key), making the rule-12 audit lie.Coverage
🤖 Generated with Claude Code