feat(cdk): integ-tests Phase 1 — core lifecycle E2E (#317) by ayushtr-aws · Pull Request #348 · aws-samples/sample-autonomous-cloud-coding-agents

ayushtr-aws · 2026-06-15T18:17:19Z

Summary

Phase 1 of the deploy-then-verify integration-test effort (#236). Builds on Phase 0 (#295) to drive a real agent through the full task lifecycle on a live AgentStack, asserting the four terminal paths from the Cedar HITL E2E matrix (docs/design/CEDAR_HITL_GATES.md §15.3):

submit → run → COMPLETED (repo-less default/agent-v1)
submit → run → FAILED (coding/new-task-v1, clone fails)
submit → run → AWAITING_APPROVAL → approve → terminal (Cedar write_env_files soft-deny gate)
submit → run → AWAITING_APPROVAL → deny → terminal

Closes #317.

Validation

Run end-to-end against a live deployed AgentStack (deploy → 4 real agent runs → destroy, ~54 min): all 4 assertions pass. The approve scenario completed the full happy path — the agent cloned the repo, tripped the Cedar gate on a config.env write, was approved, resumed, pushed a branch, and opened a real PR on the sandbox repo. The deny scenario blocked the write and recorded DENIED.

What's here

cdk/test/integ/integ.task-lifecycle.ts — the Phase 1 test: real AgentStack, Cognito auth chain, the 4 scenarios, waitForAssertions polling, forced teardown.
.github/workflows/integ.yml — pass COMMIT_HASH, best-effort teardown of the per-run stack, raised timeout, sandbox branch cleanup.
cdk/mise.toml — //cdk:integ exports COMMIT_HASH and runs --verbose.
.dockerignore — exclude integ-runner output from the agent build context (fixes a recursive-copy ENAMETOOLONG at synth).
docs/guides/ROADMAP.md (+ Starlight mirror) — mark Phase 1 landed.

Key design decisions

Per-run unique stack name int-<commit-hash> (from COMMIT_HASH) — the AgentCore Runtime injects service-managed agentic_ai ENIs that AWS releases asynchronously, so cdk destroy reliably fails the VPC/subnet/SG deletes. A fixed name would let a stranded stack block the next run; unique names make stranded stacks harmless (swept out-of-band as ENIs release). destroy.expectError tolerates the known teardown failure so the run reflects the assertions, not the teardown race.
stackUpdateWorkflow: false — the Runtime's slow/immutable create makes integ-runner's default two-phase deploy race itself (409); a single clean deploy is correct for runtime-behavior validation.
Polled assertions use flat exact scalars only — nested Match.* matchers don't survive serialization into the waitForAssertions Step Functions waiter (they leak {name,partial,pattern} into the expected pattern and never match).
Token/onboarding seeded before any submit — the orchestrator caches the GitHub token 5 min by ARN, so seeding must precede the first repo-bound preflight; repos are onboarded via a RepoTable putItem (the documented operator flow, no agent.ts change).
Environment-agnostic — no hardcoded account/region/ARN; config is name-based and creds supply the target account at deploy time.

Follow-ups (noted on #317)

Field-presence assertions (task_id/user_id/timestamps/approval metadata) need a separate non-polled getItem + assertAtPath (incompatible with the polling waiter).
Literal Cedar §14 A–E policy scenarios (incl. hard-deny) — the gate scenarios here exercise the soft-deny approve/deny lifecycle generically.
TaskEventsTable event-shape assertions.

Run policy, integ environment gate, concurrency, and integ-smoke semantics are unchanged from Phase 0.

codecov-commenter · 2026-06-15T18:40:39Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@b8a9ba2). Learn more about missing BASE report.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #348   +/-   ##
=======================================
  Coverage        ?   88.21%           
=======================================
  Files           ?      201           
  Lines           ?    48945           
  Branches        ?     4454           
=======================================
  Hits            ?    43179           
  Misses          ?     5766           
  Partials        ?        0

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add integ.task-lifecycle.ts: deploys the full AgentStack (orchestrator + AgentCore runtime + agent container) as backgroundagent-integ-lifecycle and drives a real agent through the four Cedar HITL terminal paths — COMPLETED (repo-less default/agent-v1), FAILED (coding/new-task-v1 against a nonexistent repo), and AWAITING_APPROVAL -> approve/deny (config.env write tripping the write_env_files soft-deny gate). Cost is bounded via low max_turns + max_budget_usd; teardown is forced on success and failure. Gate scenarios 3 & 4 read their sandbox repo + pre-seeded PAT secret from two top-of-file constants (placeholder until provisioned); the PAT is copied into the stack-created GitHubTokenSecret via a get/putSecretValue assertion, matching the documented operator flow (QUICK_START §4) with no agent.ts change. integ.yml: raise the integ job timeout 45->90m for the heavier full-stack deploy, extend the teardown safety-net to sweep both integ stacks, and add an always() sandbox bgagent/* branch-cleanup step gated on INTEG_SANDBOX_REPO / INTEG_PAT_SECRET_ID repo vars (no-op until set). Update the ROADMAP "Deployed runtime E2E verification" row (+ Starlight mirror).

…s-samples#317) Point SANDBOX_REPO at ayushtr-aws/abca-integ-sandbox; PRESEEDED_PAT_SECRET stays bgagent/integ/github-pat (now created in the E2E account 465528542731). CI repo vars INTEG_SANDBOX_REPO / INTEG_PAT_SECRET_ID set to match so the always() branch-cleanup step activates.

…ws-samples#317) The agent AgentRuntimeArtifact is built with the repo root as its asset/build context (agent.ts), and integ-runner writes its synth + snapshot output UNDER that root (cdk/test/integ/cdk-integ.out.<test>.ts[.snapshot]/). Staging the root then copied its own output dir into itself recursively until the path overflowed (ENAMETOOLONG), failing synth before any deploy. The .dockerignore already excluded cdk/cdk.out/ for the same recursive-include reason but missed the integ-runner dirs. Add them (mirrors .gitignore 70-71). Verified with `integ-runner --dry-run`: synth + asset staging now succeed (SUCCESS, 72s) where they previously errored.

…les#317) Two test-harness fixes from the first live run (no agent.ts changes): 1. integ-runner --verbose (cdk/mise.toml): the default one-line-per-test output hides which assertion failed and its actual-vs-expected payload. The lifecycle test polls DynamoDB for terminal task status; without verbose, a task stuck at SUBMITTED instead of COMPLETED is undiagnosable. Verbose surfaces the assertion diffs. 2. Retry-with-backoff teardown (integ.yml): the AgentCore Runtime runs in VPC mode and injects service-managed "agentic_ai" ENIs into the private subnets. Deleting the Runtime reclaims those ENIs ASYNCHRONOUSLY, so a single delete-stack races the reaper and fails DELETE_FAILED on the subnets + RuntimeSG. The first live run stranded the stack exactly this way; a manual re-delete minutes later succeeded. The teardown now loops delete→wait→ re-check with linear backoff (60s..6m) and only fails the step if all attempts are exhausted.

…atus (aws-samples#317) aws-samples#317 asks the terminal-state assertions to cover task_id, user_id, status, timestamps, and approval metadata. The first cut asserted only `status`. Add a `presentString` matcher (Match.objectLike on { S: stringLikeRegexp('.+') }) and extend every poll: - COMPLETED: + task_id, user_id, created_at, updated_at present - FAILED: + the above, plus error_message present (records WHY it failed) - AWAITING_APPROVAL (both gates): + task_id, user_id, awaiting_approval_request_id - APPROVED row: + task_id, request_id, user_id, decided_at (proves the owning caller recorded the decision, not just a status flip) - DENIED row: + the above, plus deny_reason present Runtime-generated values (user_id = Cognito sub UUID, ISO timestamps) are matched for presence, not pinned. Validated with integ-runner --force --dry-run (synth + assertion serialization pass).

…ws-samples#317) First live run diagnosed via the assertion-provider Lambda logs (which survive teardown). All three failures were test-side, not platform bugs: 1. Scenario 1 → 400 VALIDATION_ERROR "Task description was blocked by content policy". The terse imperative prompt ("Reply with exactly the single word: done. Do not use any tools.") tripped the Bedrock guardrail. Replaced with a benign natural-language request. 2. Scenarios 2/3/4 → 422 REPO_NOT_ONBOARDED. The submit path enforces a RepoTable onboarding gate BEFORE clone/preflight, which the plan never accounted for. RepoTable lives in-stack (RemovalPolicy.DESTROY), so it is fresh every run — onboard via putItem assertions (minimal active row) rather than adding a Blueprint construct to the production stack: - onboardFailRepo: onboards the (GitHub-nonexistent) repo so scenario 2 now passes admission and fails at CLONE (the intended terminal-FAILED path), not at the gate. - onboardSandbox: onboards SANDBOX_REPO so the gate scenarios reach the agent. Reads the RepoTableName CfnOutput; sequences both onboarding steps before their submits in the .next() chain. Validated with integ-runner --force --dry-run.

…ws-samples#317) integ-runner defaults stackUpdateWorkflow=true, which deploys the snapshot then re-deploys the current version to verify in-place updates. The AgentCore Runtime takes minutes to reach READY and is partly immutable, so the second phase races the first (Runtime still CREATING) → 409 "agent is currently being modified" → integ-runner aborts mid-deploy and teardown strands a CREATING Runtime (DELETE_FAILED). Observed live: 334/335 resources deployed, only the Runtime was mid-create when the run aborted ~95s in. This test validates runtime behavior, not stack-update safety, so a single clean deploy is correct. Set stackUpdateWorkflow: false on the IntegTest.

…ws-samples#317) The AgentCore Runtime injects service-managed `agentic_ai` ENIs that AWS releases asynchronously, so `cdk destroy` reliably fails the subnet/SG/VPC deletes (DependencyViolation) and strands the stack. With the old fixed name (backgroundagent-integ-lifecycle) a stranded stack BLOCKED the next run (name conflict) and falsely failed the run on teardown alone. Two changes: - Unique per-commit stack name `int-<short-sha>`, sourced from the COMMIT_HASH env var (read directly via process.env — integ-runner synths in a subprocess that inherits the env but not our shell's CDK context). CI sets COMMIT_HASH from the resolved head SHA; mise //cdk:integ falls back to the local git SHA. A stranded stack now never blocks a later run; an out-of-band ephemeral sweeper reclaims int-* stacks once their ENIs detach. - destroy.expectError:true (scoped to the dependency-violation message) so the expected ENI teardown failure no longer marks the whole run FAILED — the result reflects the ASSERTIONS, not the teardown race. integ.yml: pass COMMIT_HASH to the run; rework the safety-net teardown to target the per-run int-<sha> stack, best-effort (no retry-until-deleted, no job failure) since unique names mean stranded stacks are harmless and swept out of band. Verified: integ-runner --force --dry-run synthesizes `int-abcdef12` from COMMIT_HASH (was `int-local` before the env-read fix).

…aws-samples#317) Live run proved the platform works: scenario-1 task reached COMPLETED in 3.4s with all fields correct. But the assertion polled it 25× and timed out anyway. Root cause: nested Match.* matchers (Match.objectLike / Match.stringLikeRegexp, the field-presence strengthening added earlier) do NOT survive serialization into the waitForAssertions Step Functions waiter. The provider serializes the Match object's internals — {name, partial, pattern} — into the EXPECTED pattern, and the waiter then treats those as literal required keys that never exist on the DynamoDB row. Observed in the waiter log: "Missing key 'name' / 'partial' / 'pattern'", failing every poll despite status=COMPLETED being present. Fix: every polled assertion (waitForAssertions) now uses ONLY a flat, exact scalar — `Item.status.S` (or the APPROVED/DENIED decision string) — which serializes cleanly. Removed the unused Match import. Field-presence assertions (task_id/user_id/timestamps/approval metadata, aws-samples#317) need a separate non-polled getItem + assertAtPath — re-noted as a follow-up on aws-samples#317.

…y 401 (aws-samples#317) Live run: gate tasks failed preflight with 401 GITHUB_UNREACHABLE despite a valid PAT, so they never reached AWAITING_APPROVAL and the gate assertions timed out. Root cause: resolveGitHubToken (context-hydration.ts) caches the secret value for 5 min keyed by ARN. Scenario 2 (coding/new-task-v1) runs GitHub preflight, and in the old order it ran BEFORE the seedGet/seedPut step — so it read and cached the stack's INITIAL EMPTY GitHubTokenSecret. Every later gate task reused that cached empty token → 401 → FAILED before the Cedar gate could fire. (Scenario 2 itself still "passed" because it targets a nonexistent repo and fails regardless, hiding the poisoning. Scenario 1 is repo-less so its preflight short-circuits and never touches the token.) Fix: move seedGet→seedPut to immediately after auth, before ANY submit, so the secret is populated before the first token read and no empty value is ever cached. Stays true to the design (QUICK_START §4: operator populates the secret before submitting tasks) — no agent.ts change, no import-by-ARN deviation. Onboarding steps moved ahead of submits too.

…mples#317) The header + gate-config comments implied the test is pinned to a specific E2E account. Reword to reflect reality: it deploys to whatever account/region the caller's creds resolve to, config is name-based (not ARN-pinned), and note the PAT needs Contents+PR write for the approve scenario's git push.

…s-samples#317) osv.dev recently widened GHSA-h67p-54hq-rp68 to cover js-yaml <4.2.0 (was <4.x), so the transitive js-yaml@3.14.2 pulled in by @istanbuljs/load-nyc-config (Jest coverage tooling) now fails the required security-pr.yml dependency scan — main itself is currently red on this. Add a js-yaml "^4.2.0" resolution (same pattern the repo already uses for esbuild/vite/etc.). 4.2.0 only consolidates the existing 3.14.2 + 4.1.1 trees onto one version; load-nyc-config calls js-yaml.load(), unchanged across 3→4. Dev/test-only dependency — no source imports js-yaml, nothing shipped changes. Verified: osv-scanner → No issues found; cdk compile clean; full jest suite 2039/2039 pass.

isadeks · 2026-06-17T18:39:42Z

Review: `feat(cdk): integ-tests Phase 1 — core lifecycle E2E (#317)`

Carefully-built test with genuinely strong design reasoning — the comments capture real lessons (Match matchers don't survive the waitForAssertions serialization, token-seed-before-submit ordering, per-run unique stack names to dodge the AgentCore ENI teardown race, expectError on the known-failing destroy). The CI auth model is also solid: workflow_run trusted context, safe-to-test fork-label gate + integ environment approval (defence in depth), least-privilege role, permissions: {} with per-job grants, pinned action SHAs, persist-credentials: false.

I verified the load-bearing facts and they hold: all 7 CfnOutput ids resolve against agent.ts; per-user concurrency defaults to 10 so the 4 submits don't trip admission; StrandedTaskReconciler runs at the 2h approval-stranded default so the gate won't auto-TIMED_OUT mid-run.

Two blockers worth fixing before merge, then nits.

🚫 1. Billable leak — teardown relies on a sweeper that doesn't exist (`integ.yml`)

This PR removes the old blocking guard:

- aws cloudformation delete-stack --stack-name backgroundagent-integ || true
- # No `|| true` on the wait: a DELETE_FAILED must surface loudly so we
- # never silently leak billable resources in the shared account.
- aws cloudformation wait stack-delete-complete --stack-name backgroundagent-integ

…replacing it with fire-and-forget delete-stack || true, on the rationale that "stranded int-* stacks are reclaimed by the ephemeral sweeper once their ENIs detach."

I searched the repo at the PR head — there is no such sweeper. The only int-*/"sweep"/"ephemeral" references are the comments in integ.yml itself; the only scheduled workflows are monthly-repo-metrics, security, and upgrade-main, none of which delete CloudFormation stacks. The mechanism the entire teardown argument depends on is unimplemented.

Combined with the (correct) decision that the AgentCore ENI race makes cdk destroy reliably fail, the net effect: every run whose ENIs haven't detached when the one-shot delete fires leaves a stranded stack nobody collects — and the unique-per-commit naming (great for not blocking the next run) also means these accumulate silently instead of surfacing. Each stranded int-<sha> carries a VPC + 1 NAT gateway + 8 interface endpoints (agent-vpc.ts) + the AgentCore runtime — real hourly cost in a shared account, indefinitely. The old code failed loudly on a stuck delete for exactly this reason.

Fix: either ship the sweeper in this PR (scheduled workflow that lists int-* stacks, retries delete-stack, alarms on ones older than N hours), or have the always() teardown retry-with-backoff and fail loudly / alarm if it still can't delete. At minimum, don't claim a sweeper reclaims these until one exists.

🚫 2. Token-TTL race → intermittent false failures (`integ.task-lifecycle.ts:392`)

A single Cognito ID token is minted once at auth and reused for every HTTP call across a strictly-serial .next() chain. Polling budget before the late POSTs:

before approve: pollComplete (12m) + pollFail (12m) + pollGateApprove (8m) ≈ 32 min
before deny: + pollApproveDecision (8m) + pollGateDeny (8m) ≈ 48 min

…all from token issuance, plus real agent cold-start/runtime. The PR's own note says the live run took ~54 min. But the AppClient sets no idTokenValidity (task-api.ts:247), so it inherits Cognito's 60-min default, and the API uses a CognitoUserPoolsAuthorizer (task-api.ts:455). Once a slow agent run pushes the approve/deny POSTs past 60 min → 401 → decision never recorded → pollApproveDecision/pollDenyDecision time out as a false failure keyed to agent latency.

Fix (cheap): set idTokenValidity: Duration.hours(2) on the integ deploy, or re-mint the token right before the gate POSTs.

🔧 Nits

Items.0 approval-row assumption (:547, :600) — queryApprove/queryDeny read Items.0.request_id.S with no Limit/ScanIndexForward/status filter, assuming exactly one approval row per task. If new-task-v1 ever trips the gate more than once, Items[0] is nondeterministic and the POST may target the wrong request_id. A status = PENDING filter makes it deterministic.
Hardcoded personal sandbox (:324) — SANDBOX_REPO = 'ayushtr-aws/abca-integ-sandbox' and PRESEEDED_PAT_SECRET are source literals pointing at one contributor's repo/secret, so scenarios 3 & 4 (the actual Cedar approve/deny gates — the headline of this PR) silently degrade to clone-failures for any other account. The comment admits it; better to read the INTEG_SANDBOX_REPO CI var the cleanup step already uses.
Sandbox PAT scope — the cleanup step exports a PAT as GH_TOKEN in an always() shell step. Not a code bug (it's not logged/interpolated, and runs in the trusted context), but verify the PAT is fine-grained and scoped to only the sandbox repo, since a broad/classic token here would be a large blast radius.

✅ Cleared

Branch-deletion loop isn't injectable (repo-var + API-sourced branch names used only as gh api path args); PAT consumed via env, not logged; gh preinstalled on ubuntu-latest; fork test code correctly gated behind label + environment approval.

Reviewed at xhigh effort. CfnOutput ids, authorizer type, concurrency cap, reconciler timeout, the absence of a stack-sweeper, the NAT + 8-interface-endpoint VPC cost, and the removed wait stack-delete-complete guard all verified against the repo source at the PR head.

…-driven sandbox (aws-samples#317) Review fixes for PR aws-samples#348: - Token-TTL race (blocker): the single idToken minted at auth was reused for the approve/deny POSTs ~48 min later, past Cognito's 60-min default validity → risk of 401 → decision never recorded → false timeout. Re-mint a fresh token right before each gate POST (reAuthApprove/reAuthDeny in the .next() chain). Test-side only — no idTokenValidity change to the production app client. - Approval-row determinism (nit): queryApprove/queryDeny now filter status=PENDING so Items[0] is the live row even if a task trips the gate more than once or carries already-decided rows. - Env-driven sandbox config (nit): SANDBOX_REPO / PRESEEDED_PAT_SECRET now read INTEG_SANDBOX_REPO / INTEG_PAT_SECRET_ID (the CI vars the cleanup step already uses), falling back to the literals, so the gate scenarios bind to the running account's sandbox instead of a hardcoded contributor repo. integ.yml passes the vars to the run step. Verified: compile, eslint, zizmor, and integ-runner --force --dry-run all pass.

ayushtr-aws · 2026-06-18T15:14:28Z

Thanks for the thorough review, @isadeks — really appreciate the depth. Pushed 34e7343 addressing blocker #2 and the nits:

🚫 # 2 Token-TTL race — fixed. Re-mint a fresh Cognito token immediately before each gate POST (reAuthApprove/reAuthDeny, sequenced right before approve/deny in the .next() chain), so the token is minted minutes — not ~50 min — before use. Kept it test-side rather than setting idTokenValidity on the app client, to preserve the 'deploy AgentStack unchanged' property.

🔧 Items.0 determinism — fixed. queryApprove/queryDeny now carry a status = PENDING FilterExpression, so Items[0] is the live pending row even if a task trips the gate more than once.

🔧 Hardcoded sandbox — fixed. SANDBOX_REPO/PRESEEDED_PAT_SECRET now read INTEG_SANDBOX_REPO/INTEG_PAT_SECRET_ID (the same CI vars the cleanup step uses), falling back to the literals; integ.yml passes them to the run step.

🔧 PAT scope — confirmed: the sandbox PAT is a fine-grained token scoped to only the sandbox repo (Contents+PR write), not a classic/broad token.

Verified locally: compile, eslint, zizmor, and integ-runner --force --dry-run all pass on the rebased base.

…on leaks (aws-samples#317) Addresses PR aws-samples#348 blocker #1: the teardown comments claimed an 'ephemeral sweeper' reclaims stranded int-* stacks, but none existed — so failed-teardown stacks (the expected AgentCore agentic_ai ENI race) accumulated silently, each leaking a VPC + NAT gateway + interface endpoints + runtime in the shared account. New scheduled workflow .github/workflows/integ-sweeper.yml (every 2h): - lists all int-* CloudFormation stacks and best-effort delete-stacks each (idempotent; the ENIs have had time to detach by the time it runs, unlike the in-run teardown which races them); - for any int-* stack still undeletable past ALARM_AGE_HOURS (6h — comfortably past the observed ~1-2h ENI-release window, so normal lag never false-alarms), FAILS THE JOB and opens a tracking issue (mirrors security.yml's pattern), so a genuine leak surfaces loudly instead of hiding. Runs under environment: integ with OIDC (id-token: write) to assume the same AWS_ROLE_TO_ASSUME integ.yml uses — keeps the secret in a dedicated environment (zizmor secrets-outside-env clean). Issue body passed via env, not inline interpolation (zizmor template-injection clean). Also corrects the integ.yml + integ.task-lifecycle.ts comments to point at the now-real sweeper instead of overclaiming. Hybrid teardown story: in-run delete stays best-effort/quiet (expectError), the sweeper does reliable async cleanup + the loud alarm. Verified: YAML parse, bash -n, zizmor (new file 0 findings), compile, eslint.

ayushtr-aws · 2026-06-18T15:34:24Z

🚫 # 1 Billable leak / missing sweeper — fixed in a5a188f, the sweeper has been implemented.

Added .github/workflows/integ-sweeper.yml (scheduled every 2h):

Lists all int-* stacks and best-effort delete-stacks each. Because it runs later than the integ job, the AgentCore agentic_ai ENIs have had time to detach, so these deletes actually succeed — unlike the in-run teardown that races them.
For any int-* stack still undeletable past 6h (comfortably beyond the observed ~1–2h ENI-release window, so normal teardown lag never false-alarms), it fails the job and opens a tracking issue — mirrors security.yml's pattern, so a genuine leak surfaces loudly instead of accumulating silently.

Net teardown story is now a coherent hybrid: the in-run delete stays best-effort/quiet (expectError, so the run reflects assertions not the ENI race), and the sweeper owns reliable async cleanup plus the loud alarm. I also corrected the integ.yml + integ.task-lifecycle.ts comments to point at the now-real workflow instead of overclaiming.

Security notes: the sweeper runs under environment: integ with OIDC to assume the same AWS_ROLE_TO_ASSUME (keeps the secret in a dedicated environment — zizmor secrets-outside-env clean), and the issue body is passed via env not inline interpolation (zizmor template-injection clean). One operational note: the integ environment's protection rules need to allow the scheduled run to assume the role (no manual approval is possible on a cron trigger) — flagging in case the env requires reviewers.

All five review points (both blockers + three nits) are now addressed across 34e7343 and a5a188f. Thanks again @isadeks.

isadeks · 2026-06-18T15:56:08Z

Re-review — both blockers resolved ✅ (recommend approve once residuals are noted)

Re-reviewed at head a5a188f. The new commits map cleanly onto every finding from the first-pass review, and I did an additional design/security/architecture sweep. Summary: both blockers are fixed and verified; what's left is nits + one environment-config decision.

Blockers — resolved

Finding	Fix	Verified
🚫 Billable leak / no sweeper	new `.github/workflows/integ-sweeper.yml`	✅
🚫 Token-TTL race	fresh `reAuthApprove`/`reAuthDeny` tokens before the gate POSTs	✅
🔧 `Items.0` approval-row assumption	`status = PENDING` FilterExpression	✅

Sweeper runs every 2h, deletes all int-* stacks, and fails loudly + opens a tracking issue for any stack older than ALARM_AGE_HOURS=6 that still won't delete — past the observed ~1–2h ENI-release window, so normal lag doesn't false-alarm but a genuine leak surfaces. integ.yml's teardown comment now points at this real workflow. Env-passed LEAKED (no template injection), pinned SHA, concurrency guard.
Token race: the chain wires …queryApprove → reAuthApprove → approve and …queryDeny → reAuthDeny → deny, and the POSTs send approveToken/denyToken (lines 413/469). A slow run past Cognito's 60-min default no longer 401s.

Verified sound (design / security)

Agent egress isolation — Runtime SG is 443-only with allowAllOutbound: false + DNS firewall, so a misbehaving real agent in the shared account is egress-bounded, on top of max_turns/max_budget_usd.
Sweeper permissions — top-level permissions: contents: none narrows the default token; the job re-grants exactly id-token: write + contents: read + issues: write. Least-privilege.
Field-presence assertions — traced the history: a commit added user_id/created_at checks via Match.objectLike, a later one removed them back to status-only. That's correct, not a regression — nested Match.* doesn't survive the waitForAssertions serialization, so field-presence needs a non-polled getItem+assertAtPath (deferred to a feat(cdk): integ-tests Phase 1 — core lifecycle E2E #317 follow-up).

Non-blocking items

integ environment vs. the new sweeper (config decision — partly in scope). integ.yml's comments describe the integ environment approval as a defense-in-depth gate ("an admin reviews fork test code before it runs with the privileged role"). But the integ environment has no required reviewers (verified via the environments API), so that gate doesn't currently exist — fork-code safety rests entirely on the safe-to-test label check in resolve. This PR sharpens it two ways: it escalates what runs behind integ from a trimmed Task API stack to the full AgentStack + a real agent with a write-PAT, and the new sweeper depends on integ staying unprotected (a cron trigger can't satisfy a manual approval, so adding required reviewers would silently break the sweeper). The settings change itself is a maintainer call, but in-PR it's worth: (a) correcting the comments that overstate the approval gate, and (b) giving the sweeper its own environment (or explicitly documenting that integ must remain approval-free).
Sweeper issue-spam (integ-sweeper.yml:154). gh issue create runs unconditionally on leaked != '' with no gh issue list --search/dedup guard — a stack stuck 12h files 6 duplicate issues. Check for an existing open issue (stable title/label) and comment/skip instead.
Sweeper int- prefix breadth. The sweep delete-stacks any stack named int-*. Safe today (no other stack matches), but it's an unguarded delete loop on a 4-char prefix in a shared account — a more specific shape (int-<8hexsha>) or a tag filter would harden it.
Personal public-repo fallback (integ.task-lifecycle.ts:154). Hardcoding is fixed (now reads INTEG_SANDBOX_REPO/INTEG_PAT_SECRET_ID), but the fallback still targets a contributor's public personal repo (ayushtr-aws/abca-integ-sandbox, confirmed public). For upstream aws-samples, prefer leaving SANDBOX_REPO unset → scenarios 3 & 4 skip with a clear "set INTEG_SANDBOX_REPO" message, rather than silently defaulting into one person's account. (PAT is asserted fine-grained/repo-scoped in the comment — worth confirming on the provisioning side.)

Net

Both original blockers properly resolved. Remaining: one environment-config decision (#1, partly a maintainer action) and three nits (#2–#4). Good to merge once #1 is decided and the nits are folded in or filed as follow-ups.

isadeks

Approving — both blockers from the earlier review are resolved and verified at a5a188f:

Billable leak → the new integ-sweeper.yml reclaims stranded int-* stacks and fails loudly + files an issue past the 6h threshold.
Token-TTL race → fresh reAuthApprove/reAuthDeny tokens are minted right before the gate POSTs.
Items.0 → now filtered to status = PENDING.

Also confirmed sound: agent egress isolation (443-only SG + DNS firewall), least-privilege sweeper permissions, and the status-only assertions (field-presence correctly deferred since nested Match.* doesn't survive the polling waiter).

Remaining items are non-blocking follow-ups, not merge gates:

The integ environment having no required reviewers is a repo-settings decision (maintainer action, out of this diff) — but worth a follow-up since the sweeper now depends on it staying approval-free and the integ.yml comments overstate the approval gate.
Sweeper issue-spam dedup, the int- delete-prefix breadth, and the personal public-repo fallback default — all nice-to-harden, none blocking.

Nice work on the teardown design and the token-race fix.

… correct env-gate comments (aws-samples#317) Re-review residuals from PR aws-samples#348 (isadeks): - Sweeper issue-spam (nit #2): dedup on a stable `integ-leak` label — comment on the existing open issue instead of filing a duplicate each 2h cycle. Label is created idempotently before use. - Sweeper prefix breadth (nit aws-samples#3): only sweep `int-<8 hex>` (the CI stack-name shape) via an authoritative regex guard, not a bare `int-*` glob, so an unrelated stack sharing the 4-char prefix is never deleted in the shared account. - Personal-repo fallback (nit aws-samples#4): drop the hardcoded ayushtr-aws/abca-integ-sandbox / secret literals. Gate scenarios 3 & 4 now run ONLY when INTEG_SANDBOX_REPO + INTEG_PAT_SECRET_ID are both set; otherwise they cleanly skip (warn) instead of routing a write-PAT clone into a contributor's personal repo. The gate assertion calls are constructed only inside the enabled branch so nothing is registered when skipped. - integ env comments (in-PR part of #1): correct comments that overstated the `integ` environment approval as the fork-code gate. The enforced gate is the `safe-to-test` label; the environment adds an approval ONLY if required reviewers are configured (currently none). Note that the cron-triggered sweeper depends on the env staying approval-free. Whether to add required reviewers is a maintainer/admin action outside this PR. Validated: compile, eslint, zizmor (both workflows clean), and integ-runner --force --dry-run pass with gates both disabled and enabled.

ayushtr-aws · 2026-06-18T18:12:45Z

Thanks again, @isadeks — pushed 51bdde6 addressing the three nits and the in-PR part of the env-config item.

🔧 #2 Sweeper issue-spam — fixed. The leak-issue step now dedups on a stable integ-leak label: it looks for an existing open issue with that label and comments on it instead of filing a fresh duplicate each 2h cycle; it only gh issue creates when none exists. The label is created idempotently before use. So a stack stuck for 12h now produces one issue + periodic comments, not six issues.

🔧 #3 Sweeper int- prefix breadth — fixed. The sweep no longer acts on a bare int-* glob. It now applies an authoritative regex guard ^int-[0-9a-f]{8}$ — exactly the CI stack-name shape (COMMIT_HASH.slice(0,8) in integ.task-lifecycle.ts) — and logs+skips anything that doesn't match. An unrelated stack that merely starts with those 4 chars (or the int-local dev fallback) is never deleted. The JMESPath prefilter stays as a cheap page-narrower; the regex is the real gate.

🔧 #4 Personal public-repo fallback — fixed. Dropped the hardcoded ayushtr-aws/abca-integ-sandbox / secret literals entirely (no fallback). Gate scenarios 3 & 4 now run only when both INTEG_SANDBOX_REPO and INTEG_PAT_SECRET_ID are set; otherwise they skip with a clear warning (set both to exercise the gates) rather than silently routing a write-PAT clone into one contributor's account. Implementation note: integ-tests registers an assertion at construction, not at .next(), so I moved all the gate assertion calls (incl. the reAuth* token mints and the PAT seed) inside the if (gatesEnabled) block and assemble the .next() chain conditionally — when disabled, nothing is registered and the run reduces cleanly to scenarios 1 & 2. Verified both ways via integ-runner --force --dry-run (gates-disabled and gates-enabled both synth + pass).

🚫/🔧 #1 integ environment vs. the sweeper —

In-PR (done): corrected the comments in integ.yml that overstated the integ environment approval as the fork-code gate. They now state plainly that the enforced fork-code gate is the safe-to-test label, and the integ environment only adds a manual approval if required reviewers are configured (currently none). I also added an explicit note — at the trigger, the sweeper, and the env declaration — that the cron-triggered integ-sweeper depends on integ staying approval-free (a schedule can't satisfy a manual approval), so if reviewers are ever wanted for PR runs the sweeper should get its own environment.
Maintainer call (out of PR): whether to actually add required reviewers to the integ environment is a repo-Settings change only an admin can make, and given the sweeper dependency above it's a deliberate trade-off rather than a code fix. Flagging for an admin decision; happy to give the sweeper a dedicated environment in a follow-up if you'd prefer reviewers on integ.

One note re: the rebase onto the latest base — #109 just landed scripts/cleanup-ephemeral-stacks.sh, which also does ENI-aware ephemeral-stack cleanup. They don't overlap with the sweeper: that script matches by stack description ("ABCA Development Stack"*) for dev stacks, whereas the integ stacks carry a different description and are matched by name (int-<hash>). So the two target disjoint sets and the sweeper is still needed. (A possible future consolidation, but out of scope here.)

Validated on the rebased head: mise //cdk:compile, eslint, zizmor (both workflows clean), and integ-runner --force --dry-run all pass.

ayushtr-aws requested a review from a team as a code owner June 15, 2026 18:17

ayushtr-aws force-pushed the feat/317-core-lifecycle-e2e branch from 5168c27 to aa69eca Compare June 15, 2026 18:24

ayushtr-aws added 12 commits June 15, 2026 14:41

ayushtr-aws force-pushed the feat/317-core-lifecycle-e2e branch from aa69eca to 3f82e39 Compare June 15, 2026 19:00

ayushtr-aws and others added 2 commits June 15, 2026 15:13

Merge branch 'aws-samples:main' into feat/317-core-lifecycle-e2e

11edc69

Merge branch 'main' into feat/317-core-lifecycle-e2e

a91b08c

isadeks requested a review from a team as a code owner June 17, 2026 18:25

isadeks previously approved these changes Jun 18, 2026

View reviewed changes

isadeks and others added 3 commits June 18, 2026 12:21

Merge branch 'main' into feat/317-core-lifecycle-e2e

7032e80

Merge branch 'main' into feat/317-core-lifecycle-e2e

a66a8ab

ayushtr-aws dismissed isadeks’s stale review via 51bdde6 June 18, 2026 18:12

Merge branch 'main' into feat/317-core-lifecycle-e2e

5fb4632

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cdk): integ-tests Phase 1 — core lifecycle E2E (#317)#348

feat(cdk): integ-tests Phase 1 — core lifecycle E2E (#317)#348
ayushtr-aws wants to merge 20 commits into
aws-samples:mainfrom
ayushtr-aws:feat/317-core-lifecycle-e2e

ayushtr-aws commented Jun 15, 2026

Uh oh!

codecov-commenter commented Jun 15, 2026 •

edited

Loading

Uh oh!

isadeks commented Jun 17, 2026

Uh oh!

ayushtr-aws commented Jun 18, 2026 •

edited

Loading

Uh oh!

ayushtr-aws commented Jun 18, 2026 •

edited

Loading

Uh oh!

isadeks commented Jun 18, 2026

Uh oh!

isadeks left a comment

Uh oh!

ayushtr-aws commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ayushtr-aws commented Jun 15, 2026

Summary

Validation

What's here

Key design decisions

Follow-ups (noted on #317)

Uh oh!

codecov-commenter commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

isadeks commented Jun 17, 2026

Review: feat(cdk): integ-tests Phase 1 — core lifecycle E2E (#317)

🚫 1. Billable leak — teardown relies on a sweeper that doesn't exist (integ.yml)

🚫 2. Token-TTL race → intermittent false failures (integ.task-lifecycle.ts:392)

🔧 Nits

✅ Cleared

Uh oh!

ayushtr-aws commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ayushtr-aws commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

isadeks commented Jun 18, 2026

Re-review — both blockers resolved ✅ (recommend approve once residuals are noted)

Blockers — resolved

Verified sound (design / security)

Non-blocking items

Net

Uh oh!

isadeks left a comment

Choose a reason for hiding this comment

Uh oh!

ayushtr-aws commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Jun 15, 2026 •

edited

Loading

Review: `feat(cdk): integ-tests Phase 1 — core lifecycle E2E (#317)`

🚫 1. Billable leak — teardown relies on a sweeper that doesn't exist (`integ.yml`)

🚫 2. Token-TTL race → intermittent false failures (`integ.task-lifecycle.ts:392`)

ayushtr-aws commented Jun 18, 2026 •

edited

Loading

ayushtr-aws commented Jun 18, 2026 •

edited

Loading