Skip to content

feat(cdk): integ-tests Phase 1 — core lifecycle E2E (#317)#348

Open
ayushtr-aws wants to merge 20 commits into
aws-samples:mainfrom
ayushtr-aws:feat/317-core-lifecycle-e2e
Open

feat(cdk): integ-tests Phase 1 — core lifecycle E2E (#317)#348
ayushtr-aws wants to merge 20 commits into
aws-samples:mainfrom
ayushtr-aws:feat/317-core-lifecycle-e2e

Conversation

@ayushtr-aws

Copy link
Copy Markdown
Contributor

Summary

Phase 1 of the deploy-then-verify integration-test effort (#236). Builds on Phase 0 (#295) to drive a real agent through the full task lifecycle on a live AgentStack, asserting the four terminal paths from the Cedar HITL E2E matrix (docs/design/CEDAR_HITL_GATES.md §15.3):

  1. submit → run → COMPLETED (repo-less default/agent-v1)
  2. submit → run → FAILED (coding/new-task-v1, clone fails)
  3. submit → run → AWAITING_APPROVAL → approve → terminal (Cedar write_env_files soft-deny gate)
  4. submit → run → AWAITING_APPROVAL → deny → terminal

Closes #317.

Validation

Run end-to-end against a live deployed AgentStack (deploy → 4 real agent runs → destroy, ~54 min): all 4 assertions pass. The approve scenario completed the full happy path — the agent cloned the repo, tripped the Cedar gate on a config.env write, was approved, resumed, pushed a branch, and opened a real PR on the sandbox repo. The deny scenario blocked the write and recorded DENIED.

What's here

  • cdk/test/integ/integ.task-lifecycle.ts — the Phase 1 test: real AgentStack, Cognito auth chain, the 4 scenarios, waitForAssertions polling, forced teardown.
  • .github/workflows/integ.yml — pass COMMIT_HASH, best-effort teardown of the per-run stack, raised timeout, sandbox branch cleanup.
  • cdk/mise.toml//cdk:integ exports COMMIT_HASH and runs --verbose.
  • .dockerignore — exclude integ-runner output from the agent build context (fixes a recursive-copy ENAMETOOLONG at synth).
  • docs/guides/ROADMAP.md (+ Starlight mirror) — mark Phase 1 landed.

Key design decisions

  • Per-run unique stack name int-<commit-hash> (from COMMIT_HASH) — the AgentCore Runtime injects service-managed agentic_ai ENIs that AWS releases asynchronously, so cdk destroy reliably fails the VPC/subnet/SG deletes. A fixed name would let a stranded stack block the next run; unique names make stranded stacks harmless (swept out-of-band as ENIs release). destroy.expectError tolerates the known teardown failure so the run reflects the assertions, not the teardown race.
  • stackUpdateWorkflow: false — the Runtime's slow/immutable create makes integ-runner's default two-phase deploy race itself (409); a single clean deploy is correct for runtime-behavior validation.
  • Polled assertions use flat exact scalars only — nested Match.* matchers don't survive serialization into the waitForAssertions Step Functions waiter (they leak {name,partial,pattern} into the expected pattern and never match).
  • Token/onboarding seeded before any submit — the orchestrator caches the GitHub token 5 min by ARN, so seeding must precede the first repo-bound preflight; repos are onboarded via a RepoTable putItem (the documented operator flow, no agent.ts change).
  • Environment-agnostic — no hardcoded account/region/ARN; config is name-based and creds supply the target account at deploy time.

Follow-ups (noted on #317)

  • Field-presence assertions (task_id/user_id/timestamps/approval metadata) need a separate non-polled getItem + assertAtPath (incompatible with the polling waiter).
  • Literal Cedar §14 A–E policy scenarios (incl. hard-deny) — the gate scenarios here exercise the soft-deny approve/deny lifecycle generically.
  • TaskEventsTable event-shape assertions.

Run policy, integ environment gate, concurrency, and integ-smoke semantics are unchanged from Phase 0.

@ayushtr-aws ayushtr-aws requested a review from a team as a code owner June 15, 2026 18:17
@ayushtr-aws ayushtr-aws force-pushed the feat/317-core-lifecycle-e2e branch from 5168c27 to aa69eca Compare June 15, 2026 18:24
@codecov-commenter

codecov-commenter commented Jun 15, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@b8a9ba2). Learn more about missing BASE report.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #348   +/-   ##
=======================================
  Coverage        ?   88.21%           
=======================================
  Files           ?      201           
  Lines           ?    48945           
  Branches        ?     4454           
=======================================
  Hits            ?    43179           
  Misses          ?     5766           
  Partials        ?        0           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add integ.task-lifecycle.ts: deploys the full AgentStack (orchestrator +
AgentCore runtime + agent container) as backgroundagent-integ-lifecycle and
drives a real agent through the four Cedar HITL terminal paths — COMPLETED
(repo-less default/agent-v1), FAILED (coding/new-task-v1 against a nonexistent
repo), and AWAITING_APPROVAL -> approve/deny (config.env write tripping the
write_env_files soft-deny gate). Cost is bounded via low max_turns +
max_budget_usd; teardown is forced on success and failure.

Gate scenarios 3 & 4 read their sandbox repo + pre-seeded PAT secret from two
top-of-file constants (placeholder until provisioned); the PAT is copied into
the stack-created GitHubTokenSecret via a get/putSecretValue assertion, matching
the documented operator flow (QUICK_START §4) with no agent.ts change.

integ.yml: raise the integ job timeout 45->90m for the heavier full-stack
deploy, extend the teardown safety-net to sweep both integ stacks, and add an
always() sandbox bgagent/* branch-cleanup step gated on INTEG_SANDBOX_REPO /
INTEG_PAT_SECRET_ID repo vars (no-op until set).

Update the ROADMAP "Deployed runtime E2E verification" row (+ Starlight mirror).
…s-samples#317)

Point SANDBOX_REPO at ayushtr-aws/abca-integ-sandbox; PRESEEDED_PAT_SECRET stays
bgagent/integ/github-pat (now created in the E2E account 465528542731). CI repo
vars INTEG_SANDBOX_REPO / INTEG_PAT_SECRET_ID set to match so the always()
branch-cleanup step activates.
…ws-samples#317)

The agent AgentRuntimeArtifact is built with the repo root as its asset/build
context (agent.ts), and integ-runner writes its synth + snapshot output UNDER
that root (cdk/test/integ/cdk-integ.out.<test>.ts[.snapshot]/). Staging the root
then copied its own output dir into itself recursively until the path overflowed
(ENAMETOOLONG), failing synth before any deploy.

The .dockerignore already excluded cdk/cdk.out/ for the same recursive-include
reason but missed the integ-runner dirs. Add them (mirrors .gitignore 70-71).
Verified with `integ-runner --dry-run`: synth + asset staging now succeed
(SUCCESS, 72s) where they previously errored.
…les#317)

Two test-harness fixes from the first live run (no agent.ts changes):

1. integ-runner --verbose (cdk/mise.toml): the default one-line-per-test output
   hides which assertion failed and its actual-vs-expected payload. The
   lifecycle test polls DynamoDB for terminal task status; without verbose, a
   task stuck at SUBMITTED instead of COMPLETED is undiagnosable. Verbose
   surfaces the assertion diffs.

2. Retry-with-backoff teardown (integ.yml): the AgentCore Runtime runs in VPC
   mode and injects service-managed "agentic_ai" ENIs into the private subnets.
   Deleting the Runtime reclaims those ENIs ASYNCHRONOUSLY, so a single
   delete-stack races the reaper and fails DELETE_FAILED on the subnets +
   RuntimeSG. The first live run stranded the stack exactly this way; a manual
   re-delete minutes later succeeded. The teardown now loops delete→wait→
   re-check with linear backoff (60s..6m) and only fails the step if all
   attempts are exhausted.
…atus (aws-samples#317)

aws-samples#317 asks the terminal-state assertions to cover task_id, user_id, status,
timestamps, and approval metadata. The first cut asserted only `status`. Add a
`presentString` matcher (Match.objectLike on { S: stringLikeRegexp('.+') }) and
extend every poll:

- COMPLETED: + task_id, user_id, created_at, updated_at present
- FAILED:    + the above, plus error_message present (records WHY it failed)
- AWAITING_APPROVAL (both gates): + task_id, user_id, awaiting_approval_request_id
- APPROVED row: + task_id, request_id, user_id, decided_at (proves the owning
  caller recorded the decision, not just a status flip)
- DENIED row:   + the above, plus deny_reason present

Runtime-generated values (user_id = Cognito sub UUID, ISO timestamps) are matched
for presence, not pinned. Validated with integ-runner --force --dry-run (synth +
assertion serialization pass).
…ws-samples#317)

First live run diagnosed via the assertion-provider Lambda logs (which survive
teardown). All three failures were test-side, not platform bugs:

1. Scenario 1 → 400 VALIDATION_ERROR "Task description was blocked by content
   policy". The terse imperative prompt ("Reply with exactly the single word:
   done. Do not use any tools.") tripped the Bedrock guardrail. Replaced with a
   benign natural-language request.

2. Scenarios 2/3/4 → 422 REPO_NOT_ONBOARDED. The submit path enforces a
   RepoTable onboarding gate BEFORE clone/preflight, which the plan never
   accounted for. RepoTable lives in-stack (RemovalPolicy.DESTROY), so it is
   fresh every run — onboard via putItem assertions (minimal active row) rather
   than adding a Blueprint construct to the production stack:
   - onboardFailRepo: onboards the (GitHub-nonexistent) repo so scenario 2 now
     passes admission and fails at CLONE (the intended terminal-FAILED path),
     not at the gate.
   - onboardSandbox: onboards SANDBOX_REPO so the gate scenarios reach the agent.

Reads the RepoTableName CfnOutput; sequences both onboarding steps before their
submits in the .next() chain. Validated with integ-runner --force --dry-run.
…ws-samples#317)

integ-runner defaults stackUpdateWorkflow=true, which deploys the snapshot then
re-deploys the current version to verify in-place updates. The AgentCore Runtime
takes minutes to reach READY and is partly immutable, so the second phase races
the first (Runtime still CREATING) → 409 "agent is currently being modified" →
integ-runner aborts mid-deploy and teardown strands a CREATING Runtime
(DELETE_FAILED). Observed live: 334/335 resources deployed, only the Runtime was
mid-create when the run aborted ~95s in.

This test validates runtime behavior, not stack-update safety, so a single clean
deploy is correct. Set stackUpdateWorkflow: false on the IntegTest.
…ws-samples#317)

The AgentCore Runtime injects service-managed `agentic_ai` ENIs that AWS releases
asynchronously, so `cdk destroy` reliably fails the subnet/SG/VPC deletes
(DependencyViolation) and strands the stack. With the old fixed name
(backgroundagent-integ-lifecycle) a stranded stack BLOCKED the next run (name
conflict) and falsely failed the run on teardown alone.

Two changes:
- Unique per-commit stack name `int-<short-sha>`, sourced from the COMMIT_HASH
  env var (read directly via process.env — integ-runner synths in a subprocess
  that inherits the env but not our shell's CDK context). CI sets COMMIT_HASH
  from the resolved head SHA; mise //cdk:integ falls back to the local git SHA.
  A stranded stack now never blocks a later run; an out-of-band ephemeral sweeper
  reclaims int-* stacks once their ENIs detach.
- destroy.expectError:true (scoped to the dependency-violation message) so the
  expected ENI teardown failure no longer marks the whole run FAILED — the result
  reflects the ASSERTIONS, not the teardown race.

integ.yml: pass COMMIT_HASH to the run; rework the safety-net teardown to target
the per-run int-<sha> stack, best-effort (no retry-until-deleted, no job failure)
since unique names mean stranded stacks are harmless and swept out of band.

Verified: integ-runner --force --dry-run synthesizes `int-abcdef12` from
COMMIT_HASH (was `int-local` before the env-read fix).
…aws-samples#317)

Live run proved the platform works: scenario-1 task reached COMPLETED in 3.4s
with all fields correct. But the assertion polled it 25× and timed out anyway.

Root cause: nested Match.* matchers (Match.objectLike / Match.stringLikeRegexp,
the field-presence strengthening added earlier) do NOT survive serialization into
the waitForAssertions Step Functions waiter. The provider serializes the Match
object's internals — {name, partial, pattern} — into the EXPECTED pattern, and
the waiter then treats those as literal required keys that never exist on the
DynamoDB row. Observed in the waiter log: "Missing key 'name' / 'partial' /
'pattern'", failing every poll despite status=COMPLETED being present.

Fix: every polled assertion (waitForAssertions) now uses ONLY a flat, exact
scalar — `Item.status.S` (or the APPROVED/DENIED decision string) — which
serializes cleanly. Removed the unused Match import. Field-presence assertions
(task_id/user_id/timestamps/approval metadata, aws-samples#317) need a separate non-polled
getItem + assertAtPath — re-noted as a follow-up on aws-samples#317.
…y 401 (aws-samples#317)

Live run: gate tasks failed preflight with 401 GITHUB_UNREACHABLE despite a valid
PAT, so they never reached AWAITING_APPROVAL and the gate assertions timed out.

Root cause: resolveGitHubToken (context-hydration.ts) caches the secret value for
5 min keyed by ARN. Scenario 2 (coding/new-task-v1) runs GitHub preflight, and in
the old order it ran BEFORE the seedGet/seedPut step — so it read and cached the
stack's INITIAL EMPTY GitHubTokenSecret. Every later gate task reused that cached
empty token → 401 → FAILED before the Cedar gate could fire. (Scenario 2 itself
still "passed" because it targets a nonexistent repo and fails regardless, hiding
the poisoning. Scenario 1 is repo-less so its preflight short-circuits and never
touches the token.)

Fix: move seedGet→seedPut to immediately after auth, before ANY submit, so the
secret is populated before the first token read and no empty value is ever
cached. Stays true to the design (QUICK_START §4: operator populates the secret
before submitting tasks) — no agent.ts change, no import-by-ARN deviation.
Onboarding steps moved ahead of submits too.
…mples#317)

The header + gate-config comments implied the test is pinned to a specific E2E
account. Reword to reflect reality: it deploys to whatever account/region the
caller's creds resolve to, config is name-based (not ARN-pinned), and note the
PAT needs Contents+PR write for the approve scenario's git push.
…s-samples#317)

osv.dev recently widened GHSA-h67p-54hq-rp68 to cover js-yaml <4.2.0 (was <4.x),
so the transitive js-yaml@3.14.2 pulled in by @istanbuljs/load-nyc-config (Jest
coverage tooling) now fails the required security-pr.yml dependency scan — main
itself is currently red on this.

Add a js-yaml "^4.2.0" resolution (same pattern the repo already uses for
esbuild/vite/etc.). 4.2.0 only consolidates the existing 3.14.2 + 4.1.1 trees
onto one version; load-nyc-config calls js-yaml.load(), unchanged across 3→4.
Dev/test-only dependency — no source imports js-yaml, nothing shipped changes.

Verified: osv-scanner → No issues found; cdk compile clean; full jest suite
2039/2039 pass.
@ayushtr-aws ayushtr-aws force-pushed the feat/317-core-lifecycle-e2e branch from aa69eca to 3f82e39 Compare June 15, 2026 19:00
@isadeks isadeks requested a review from a team as a code owner June 17, 2026 18:25
@isadeks

isadeks commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Review: feat(cdk): integ-tests Phase 1 — core lifecycle E2E (#317)

Carefully-built test with genuinely strong design reasoning — the comments capture real lessons (Match matchers don't survive the waitForAssertions serialization, token-seed-before-submit ordering, per-run unique stack names to dodge the AgentCore ENI teardown race, expectError on the known-failing destroy). The CI auth model is also solid: workflow_run trusted context, safe-to-test fork-label gate + integ environment approval (defence in depth), least-privilege role, permissions: {} with per-job grants, pinned action SHAs, persist-credentials: false.

I verified the load-bearing facts and they hold: all 7 CfnOutput ids resolve against agent.ts; per-user concurrency defaults to 10 so the 4 submits don't trip admission; StrandedTaskReconciler runs at the 2h approval-stranded default so the gate won't auto-TIMED_OUT mid-run.

Two blockers worth fixing before merge, then nits.


🚫 1. Billable leak — teardown relies on a sweeper that doesn't exist (integ.yml)

This PR removes the old blocking guard:

- aws cloudformation delete-stack --stack-name backgroundagent-integ || true
- # No `|| true` on the wait: a DELETE_FAILED must surface loudly so we
- # never silently leak billable resources in the shared account.
- aws cloudformation wait stack-delete-complete --stack-name backgroundagent-integ

…replacing it with fire-and-forget delete-stack || true, on the rationale that "stranded int-* stacks are reclaimed by the ephemeral sweeper once their ENIs detach."

I searched the repo at the PR head — there is no such sweeper. The only int-*/"sweep"/"ephemeral" references are the comments in integ.yml itself; the only scheduled workflows are monthly-repo-metrics, security, and upgrade-main, none of which delete CloudFormation stacks. The mechanism the entire teardown argument depends on is unimplemented.

Combined with the (correct) decision that the AgentCore ENI race makes cdk destroy reliably fail, the net effect: every run whose ENIs haven't detached when the one-shot delete fires leaves a stranded stack nobody collects — and the unique-per-commit naming (great for not blocking the next run) also means these accumulate silently instead of surfacing. Each stranded int-<sha> carries a VPC + 1 NAT gateway + 8 interface endpoints (agent-vpc.ts) + the AgentCore runtime — real hourly cost in a shared account, indefinitely. The old code failed loudly on a stuck delete for exactly this reason.

Fix: either ship the sweeper in this PR (scheduled workflow that lists int-* stacks, retries delete-stack, alarms on ones older than N hours), or have the always() teardown retry-with-backoff and fail loudly / alarm if it still can't delete. At minimum, don't claim a sweeper reclaims these until one exists.

🚫 2. Token-TTL race → intermittent false failures (integ.task-lifecycle.ts:392)

A single Cognito ID token is minted once at auth and reused for every HTTP call across a strictly-serial .next() chain. Polling budget before the late POSTs:

  • before approve: pollComplete (12m) + pollFail (12m) + pollGateApprove (8m) ≈ 32 min
  • before deny: + pollApproveDecision (8m) + pollGateDeny (8m) ≈ 48 min

…all from token issuance, plus real agent cold-start/runtime. The PR's own note says the live run took ~54 min. But the AppClient sets no idTokenValidity (task-api.ts:247), so it inherits Cognito's 60-min default, and the API uses a CognitoUserPoolsAuthorizer (task-api.ts:455). Once a slow agent run pushes the approve/deny POSTs past 60 min → 401 → decision never recorded → pollApproveDecision/pollDenyDecision time out as a false failure keyed to agent latency.

Fix (cheap): set idTokenValidity: Duration.hours(2) on the integ deploy, or re-mint the token right before the gate POSTs.


🔧 Nits

  • Items.0 approval-row assumption (:547, :600)queryApprove/queryDeny read Items.0.request_id.S with no Limit/ScanIndexForward/status filter, assuming exactly one approval row per task. If new-task-v1 ever trips the gate more than once, Items[0] is nondeterministic and the POST may target the wrong request_id. A status = PENDING filter makes it deterministic.
  • Hardcoded personal sandbox (:324)SANDBOX_REPO = 'ayushtr-aws/abca-integ-sandbox' and PRESEEDED_PAT_SECRET are source literals pointing at one contributor's repo/secret, so scenarios 3 & 4 (the actual Cedar approve/deny gates — the headline of this PR) silently degrade to clone-failures for any other account. The comment admits it; better to read the INTEG_SANDBOX_REPO CI var the cleanup step already uses.
  • Sandbox PAT scope — the cleanup step exports a PAT as GH_TOKEN in an always() shell step. Not a code bug (it's not logged/interpolated, and runs in the trusted context), but verify the PAT is fine-grained and scoped to only the sandbox repo, since a broad/classic token here would be a large blast radius.

✅ Cleared

Branch-deletion loop isn't injectable (repo-var + API-sourced branch names used only as gh api path args); PAT consumed via env, not logged; gh preinstalled on ubuntu-latest; fork test code correctly gated behind label + environment approval.


Reviewed at xhigh effort. CfnOutput ids, authorizer type, concurrency cap, reconciler timeout, the absence of a stack-sweeper, the NAT + 8-interface-endpoint VPC cost, and the removed wait stack-delete-complete guard all verified against the repo source at the PR head.

…-driven sandbox (aws-samples#317)

Review fixes for PR aws-samples#348:

- Token-TTL race (blocker): the single idToken minted at auth was reused for the
  approve/deny POSTs ~48 min later, past Cognito's 60-min default validity → risk
  of 401 → decision never recorded → false timeout. Re-mint a fresh token right
  before each gate POST (reAuthApprove/reAuthDeny in the .next() chain). Test-side
  only — no idTokenValidity change to the production app client.

- Approval-row determinism (nit): queryApprove/queryDeny now filter
  status=PENDING so Items[0] is the live row even if a task trips the gate more
  than once or carries already-decided rows.

- Env-driven sandbox config (nit): SANDBOX_REPO / PRESEEDED_PAT_SECRET now read
  INTEG_SANDBOX_REPO / INTEG_PAT_SECRET_ID (the CI vars the cleanup step already
  uses), falling back to the literals, so the gate scenarios bind to the running
  account's sandbox instead of a hardcoded contributor repo. integ.yml passes the
  vars to the run step.

Verified: compile, eslint, zizmor, and integ-runner --force --dry-run all pass.
@ayushtr-aws

ayushtr-aws commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review, @isadeks — really appreciate the depth. Pushed 34e7343 addressing blocker #2 and the nits:

🚫 # 2 Token-TTL race — fixed. Re-mint a fresh Cognito token immediately before each gate POST (reAuthApprove/reAuthDeny, sequenced right before approve/deny in the .next() chain), so the token is minted minutes — not ~50 min — before use. Kept it test-side rather than setting idTokenValidity on the app client, to preserve the 'deploy AgentStack unchanged' property.

🔧 Items.0 determinism — fixed. queryApprove/queryDeny now carry a status = PENDING FilterExpression, so Items[0] is the live pending row even if a task trips the gate more than once.

🔧 Hardcoded sandbox — fixed. SANDBOX_REPO/PRESEEDED_PAT_SECRET now read INTEG_SANDBOX_REPO/INTEG_PAT_SECRET_ID (the same CI vars the cleanup step uses), falling back to the literals; integ.yml passes them to the run step.

🔧 PAT scope — confirmed: the sandbox PAT is a fine-grained token scoped to only the sandbox repo (Contents+PR write), not a classic/broad token.

Verified locally: compile, eslint, zizmor, and integ-runner --force --dry-run all pass on the rebased base.

…on leaks (aws-samples#317)

Addresses PR aws-samples#348 blocker #1: the teardown comments claimed an 'ephemeral
sweeper' reclaims stranded int-* stacks, but none existed — so failed-teardown
stacks (the expected AgentCore agentic_ai ENI race) accumulated silently, each
leaking a VPC + NAT gateway + interface endpoints + runtime in the shared account.

New scheduled workflow .github/workflows/integ-sweeper.yml (every 2h):
- lists all int-* CloudFormation stacks and best-effort delete-stacks each
  (idempotent; the ENIs have had time to detach by the time it runs, unlike the
  in-run teardown which races them);
- for any int-* stack still undeletable past ALARM_AGE_HOURS (6h — comfortably
  past the observed ~1-2h ENI-release window, so normal lag never false-alarms),
  FAILS THE JOB and opens a tracking issue (mirrors security.yml's pattern), so a
  genuine leak surfaces loudly instead of hiding.

Runs under environment: integ with OIDC (id-token: write) to assume the same
AWS_ROLE_TO_ASSUME integ.yml uses — keeps the secret in a dedicated environment
(zizmor secrets-outside-env clean). Issue body passed via env, not inline
interpolation (zizmor template-injection clean).

Also corrects the integ.yml + integ.task-lifecycle.ts comments to point at the
now-real sweeper instead of overclaiming. Hybrid teardown story: in-run delete
stays best-effort/quiet (expectError), the sweeper does reliable async cleanup +
the loud alarm.

Verified: YAML parse, bash -n, zizmor (new file 0 findings), compile, eslint.
@ayushtr-aws

ayushtr-aws commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

🚫 # 1 Billable leak / missing sweeper — fixed in a5a188f, the sweeper has been implemented.

Added .github/workflows/integ-sweeper.yml (scheduled every 2h):

  • Lists all int-* stacks and best-effort delete-stacks each. Because it runs later than the integ job, the AgentCore agentic_ai ENIs have had time to detach, so these deletes actually succeed — unlike the in-run teardown that races them.
  • For any int-* stack still undeletable past 6h (comfortably beyond the observed ~1–2h ENI-release window, so normal teardown lag never false-alarms), it fails the job and opens a tracking issue — mirrors security.yml's pattern, so a genuine leak surfaces loudly instead of accumulating silently.

Net teardown story is now a coherent hybrid: the in-run delete stays best-effort/quiet (expectError, so the run reflects assertions not the ENI race), and the sweeper owns reliable async cleanup plus the loud alarm. I also corrected the integ.yml + integ.task-lifecycle.ts comments to point at the now-real workflow instead of overclaiming.

Security notes: the sweeper runs under environment: integ with OIDC to assume the same AWS_ROLE_TO_ASSUME (keeps the secret in a dedicated environment — zizmor secrets-outside-env clean), and the issue body is passed via env not inline interpolation (zizmor template-injection clean). One operational note: the integ environment's protection rules need to allow the scheduled run to assume the role (no manual approval is possible on a cron trigger) — flagging in case the env requires reviewers.

All five review points (both blockers + three nits) are now addressed across 34e7343 and a5a188f. Thanks again @isadeks.

@isadeks

isadeks commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Re-review — both blockers resolved ✅ (recommend approve once residuals are noted)

Re-reviewed at head a5a188f. The new commits map cleanly onto every finding from the first-pass review, and I did an additional design/security/architecture sweep. Summary: both blockers are fixed and verified; what's left is nits + one environment-config decision.

Blockers — resolved

Finding Fix Verified
🚫 Billable leak / no sweeper new .github/workflows/integ-sweeper.yml
🚫 Token-TTL race fresh reAuthApprove/reAuthDeny tokens before the gate POSTs
🔧 Items.0 approval-row assumption status = PENDING FilterExpression
  • Sweeper runs every 2h, deletes all int-* stacks, and fails loudly + opens a tracking issue for any stack older than ALARM_AGE_HOURS=6 that still won't delete — past the observed ~1–2h ENI-release window, so normal lag doesn't false-alarm but a genuine leak surfaces. integ.yml's teardown comment now points at this real workflow. Env-passed LEAKED (no template injection), pinned SHA, concurrency guard.
  • Token race: the chain wires …queryApprove → reAuthApprove → approve and …queryDeny → reAuthDeny → deny, and the POSTs send approveToken/denyToken (lines 413/469). A slow run past Cognito's 60-min default no longer 401s.

Verified sound (design / security)

  • Agent egress isolation — Runtime SG is 443-only with allowAllOutbound: false + DNS firewall, so a misbehaving real agent in the shared account is egress-bounded, on top of max_turns/max_budget_usd.
  • Sweeper permissions — top-level permissions: contents: none narrows the default token; the job re-grants exactly id-token: write + contents: read + issues: write. Least-privilege.
  • Field-presence assertions — traced the history: a commit added user_id/created_at checks via Match.objectLike, a later one removed them back to status-only. That's correct, not a regression — nested Match.* doesn't survive the waitForAssertions serialization, so field-presence needs a non-polled getItem+assertAtPath (deferred to a feat(cdk): integ-tests Phase 1 — core lifecycle E2E #317 follow-up).

Non-blocking items

  1. integ environment vs. the new sweeper (config decision — partly in scope). integ.yml's comments describe the integ environment approval as a defense-in-depth gate ("an admin reviews fork test code before it runs with the privileged role"). But the integ environment has no required reviewers (verified via the environments API), so that gate doesn't currently exist — fork-code safety rests entirely on the safe-to-test label check in resolve. This PR sharpens it two ways: it escalates what runs behind integ from a trimmed Task API stack to the full AgentStack + a real agent with a write-PAT, and the new sweeper depends on integ staying unprotected (a cron trigger can't satisfy a manual approval, so adding required reviewers would silently break the sweeper). The settings change itself is a maintainer call, but in-PR it's worth: (a) correcting the comments that overstate the approval gate, and (b) giving the sweeper its own environment (or explicitly documenting that integ must remain approval-free).

  2. Sweeper issue-spam (integ-sweeper.yml:154). gh issue create runs unconditionally on leaked != '' with no gh issue list --search/dedup guard — a stack stuck 12h files 6 duplicate issues. Check for an existing open issue (stable title/label) and comment/skip instead.

  3. Sweeper int- prefix breadth. The sweep delete-stacks any stack named int-*. Safe today (no other stack matches), but it's an unguarded delete loop on a 4-char prefix in a shared account — a more specific shape (int-<8hexsha>) or a tag filter would harden it.

  4. Personal public-repo fallback (integ.task-lifecycle.ts:154). Hardcoding is fixed (now reads INTEG_SANDBOX_REPO/INTEG_PAT_SECRET_ID), but the fallback still targets a contributor's public personal repo (ayushtr-aws/abca-integ-sandbox, confirmed public). For upstream aws-samples, prefer leaving SANDBOX_REPO unset → scenarios 3 & 4 skip with a clear "set INTEG_SANDBOX_REPO" message, rather than silently defaulting into one person's account. (PAT is asserted fine-grained/repo-scoped in the comment — worth confirming on the provisioning side.)

Net

Both original blockers properly resolved. Remaining: one environment-config decision (#1, partly a maintainer action) and three nits (#2#4). Good to merge once #1 is decided and the nits are folded in or filed as follow-ups.

isadeks
isadeks previously approved these changes Jun 18, 2026

@isadeks isadeks left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving — both blockers from the earlier review are resolved and verified at a5a188f:

  • Billable leak → the new integ-sweeper.yml reclaims stranded int-* stacks and fails loudly + files an issue past the 6h threshold.
  • Token-TTL race → fresh reAuthApprove/reAuthDeny tokens are minted right before the gate POSTs.
  • Items.0 → now filtered to status = PENDING.

Also confirmed sound: agent egress isolation (443-only SG + DNS firewall), least-privilege sweeper permissions, and the status-only assertions (field-presence correctly deferred since nested Match.* doesn't survive the polling waiter).

Remaining items are non-blocking follow-ups, not merge gates:

  • The integ environment having no required reviewers is a repo-settings decision (maintainer action, out of this diff) — but worth a follow-up since the sweeper now depends on it staying approval-free and the integ.yml comments overstate the approval gate.
  • Sweeper issue-spam dedup, the int- delete-prefix breadth, and the personal public-repo fallback default — all nice-to-harden, none blocking.

Nice work on the teardown design and the token-race fix.

isadeks and others added 3 commits June 18, 2026 12:21
… correct env-gate comments (aws-samples#317)

Re-review residuals from PR aws-samples#348 (isadeks):

- Sweeper issue-spam (nit #2): dedup on a stable `integ-leak` label —
  comment on the existing open issue instead of filing a duplicate each
  2h cycle. Label is created idempotently before use.
- Sweeper prefix breadth (nit aws-samples#3): only sweep `int-<8 hex>` (the CI
  stack-name shape) via an authoritative regex guard, not a bare
  `int-*` glob, so an unrelated stack sharing the 4-char prefix is never
  deleted in the shared account.
- Personal-repo fallback (nit aws-samples#4): drop the hardcoded
  ayushtr-aws/abca-integ-sandbox / secret literals. Gate scenarios 3 & 4
  now run ONLY when INTEG_SANDBOX_REPO + INTEG_PAT_SECRET_ID are both
  set; otherwise they cleanly skip (warn) instead of routing a write-PAT
  clone into a contributor's personal repo. The gate assertion calls are
  constructed only inside the enabled branch so nothing is registered
  when skipped.
- integ env comments (in-PR part of #1): correct comments that
  overstated the `integ` environment approval as the fork-code gate. The
  enforced gate is the `safe-to-test` label; the environment adds an
  approval ONLY if required reviewers are configured (currently none).
  Note that the cron-triggered sweeper depends on the env staying
  approval-free. Whether to add required reviewers is a maintainer/admin
  action outside this PR.

Validated: compile, eslint, zizmor (both workflows clean), and
integ-runner --force --dry-run pass with gates both disabled and enabled.
@ayushtr-aws

Copy link
Copy Markdown
Contributor Author

Thanks again, @isadeks — pushed 51bdde6 addressing the three nits and the in-PR part of the env-config item.

🔧 #2 Sweeper issue-spam — fixed. The leak-issue step now dedups on a stable integ-leak label: it looks for an existing open issue with that label and comments on it instead of filing a fresh duplicate each 2h cycle; it only gh issue creates when none exists. The label is created idempotently before use. So a stack stuck for 12h now produces one issue + periodic comments, not six issues.

🔧 #3 Sweeper int- prefix breadth — fixed. The sweep no longer acts on a bare int-* glob. It now applies an authoritative regex guard ^int-[0-9a-f]{8}$ — exactly the CI stack-name shape (COMMIT_HASH.slice(0,8) in integ.task-lifecycle.ts) — and logs+skips anything that doesn't match. An unrelated stack that merely starts with those 4 chars (or the int-local dev fallback) is never deleted. The JMESPath prefilter stays as a cheap page-narrower; the regex is the real gate.

🔧 #4 Personal public-repo fallback — fixed. Dropped the hardcoded ayushtr-aws/abca-integ-sandbox / secret literals entirely (no fallback). Gate scenarios 3 & 4 now run only when both INTEG_SANDBOX_REPO and INTEG_PAT_SECRET_ID are set; otherwise they skip with a clear warning (set both to exercise the gates) rather than silently routing a write-PAT clone into one contributor's account. Implementation note: integ-tests registers an assertion at construction, not at .next(), so I moved all the gate assertion calls (incl. the reAuth* token mints and the PAT seed) inside the if (gatesEnabled) block and assemble the .next() chain conditionally — when disabled, nothing is registered and the run reduces cleanly to scenarios 1 & 2. Verified both ways via integ-runner --force --dry-run (gates-disabled and gates-enabled both synth + pass).

🚫/🔧 #1 integ environment vs. the sweeper

  • In-PR (done): corrected the comments in integ.yml that overstated the integ environment approval as the fork-code gate. They now state plainly that the enforced fork-code gate is the safe-to-test label, and the integ environment only adds a manual approval if required reviewers are configured (currently none). I also added an explicit note — at the trigger, the sweeper, and the env declaration — that the cron-triggered integ-sweeper depends on integ staying approval-free (a schedule can't satisfy a manual approval), so if reviewers are ever wanted for PR runs the sweeper should get its own environment.
  • Maintainer call (out of PR): whether to actually add required reviewers to the integ environment is a repo-Settings change only an admin can make, and given the sweeper dependency above it's a deliberate trade-off rather than a code fix. Flagging for an admin decision; happy to give the sweeper a dedicated environment in a follow-up if you'd prefer reviewers on integ.

One note re: the rebase onto the latest base — #109 just landed scripts/cleanup-ephemeral-stacks.sh, which also does ENI-aware ephemeral-stack cleanup. They don't overlap with the sweeper: that script matches by stack description ("ABCA Development Stack"*) for dev stacks, whereas the integ stacks carry a different description and are matched by name (int-<hash>). So the two target disjoint sets and the sweeper is still needed. (A possible future consolidation, but out of scope here.)

Validated on the rebased head: mise //cdk:compile, eslint, zizmor (both workflows clean), and integ-runner --force --dry-run all pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(cdk): integ-tests Phase 1 — core lifecycle E2E

3 participants