fix(deploy): skip CDK diff for new stacks in TUI deploy#1384
Conversation
CDK's diff() creates a temporary changeset against a non-existent stack, which briefly materializes a ghost stack in REVIEW_IN_PROGRESS state. For projects without file assets (e.g. dataset-only), the subsequent deploy() call races against the ghost stack deletion and fails with "No template found for Stack". Skip diff entirely for new stacks since there's nothing to compare against — everything would show as an addition anyway.
|
Claude Security Review: no high-confidence findings. (run) |
Package TarballHow to installgh release download pr-1384-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.15.0.tgz |
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
Thanks for tracking down this race condition — the analysis in the description is convincing.
I have one concern about error handling that should be addressed before merging. Details inline.
|
|
||
| const target = context?.awsTargets[0]; | ||
| const currentStackName = stackNames?.[0]; | ||
| const stackStatus = target && currentStackName ? await checkStackStatus(target.region, currentStackName) : null; |
There was a problem hiding this comment.
checkStackStatus only swallows ValidationError (stack-doesn't-exist) — every other error is rethrown (transient network failure, throttling, expired/invalid creds, IAM denial, etc.). Because this await sits outside any try/catch, a throw here propagates straight out of run() and skips:
isDiffRunningRef.current = falsesetIsDiffLoading(false)logger.endStep(...)setPreDeployDiffStep(... status: 'success')- the rest of the deploy flow (publish assets / deploy / dispose)
So a transient DescribeStacks failure leaves the UI permanently stuck with the "Computing diff changes..." spinner, and the deploy step itself never errors out or progresses. That's a regression from the previous code, which had the diff cleanup in a finally so any diff error was non-fatal.
A couple of options:
- Wrap just the
checkStackStatuscall in a try/catch and treat a failure as "unknown" — i.e. fall through to the existing diff path (current behavior). Simplest and preserves the bug-fix intent (worst case, you get the old race back, which you already tolerated before this PR). - Restructure so the entire diff phase (status check + diff) is inside a try/finally that always cleans up
isDiffRunningRef,isDiffLoading, andpreDeployDiffStep, mirroring the original code's resilience.
Option 2 is probably the safer long-term fix since it also re-establishes the invariant that the diff step never blocks the deploy flow.
Coverage Report
|
|
Claude Security Review: no high-confidence findings. (run) |
Wrap the entire diff phase (checkStackStatus + diff) in a try/finally so that transient failures (network, throttling, expired creds) in the stack status check never leave the UI stuck on the diff spinner. If the status check fails, fall through to the existing diff path (worst case: the old race condition, which was already tolerated before this fix).
0228c53 to
93bc372
Compare
|
Claude Security Review: no high-confidence findings. (run) |
Summary
diff()creates a temporary changeset against a non-existent stack, which briefly materializes a ghost stack. For projects with agents, the subsequent asset upload provides enough delay for the ghost to disappear. For asset-free projects,deploy()immediately races against the ghost stack deletion and fails with "No template found for Stack"diff()entirely when the stack doesn't exist yet — there's nothing meaningful to compare againstTest plan
tsc --noEmit)