You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Umbrella tracking issue for the remaining CI build-performance levers identified during the profiling that produced #357. Item #1 of the original 5 recommendations is done (PR #359 — skip the redundant jest type-check via isolatedModules): build step 710s → 346s (−51%), //cdk:test649s → 298s. This issue tracks the other four.
Each is its own measurable experiment; this issue holds the sequence, rationale, and implementer notes. Spin out a child issue per item when work starts (governance per ADR-003).
Recommendations — status & remaining work
#
Recommendation
Status
Effect
1
Skip the redundant jest type-check (transpile-only; was originally framed as "swap to @swc/jest")
Docs: specify that you can't use the agent with the canned repo #2 has a merge-queue wrinkle.build is a required status check and must report on the merge_group event (see feat(ci): require secrets/deps/SAST on every PR — make the merge gate enforceable (incident #313 class) #327 / the comment block atop build.yml). If you split CDK tests into a matrix, the required check must aggregate shard results — either keep one build job that runs shards internally, or add a "tests passed" gate job that needs: all shards and is the thing marked required. Do not mark individual shard jobs required, or the queue can deadlock/flake. Shard count should divide cleanly into the ~113 suites; 4-way is a sane start. Watch for per-shard fixed overhead (checkout, install, synth-cache restore) eroding the win — cache aggressively and measure wall-clock, not sum-of-shards.
Docs: user guide hard-codes us-east-1 #3 must preserve the enforcement gate. Coverage thresholds are the merge gate (coverageThreshold in cdk/package.json / cli/package.json + agent pytest fail_under, kept in sync via contracts/coverage-thresholds.json and check:coverage-thresholds-sync). If you skip collectCoverage on pull_request, you MUST still enforce thresholds on merge_group so nothing merges under-covered. Net effect is mostly on the high-frequency PR-push event, so quantify it there.
feat: add FargateAgentStack as alternative compute backend #4 is the cheapest experiment but recurring cost.build.yml already resolves the runner via vars.DEFAULT_RUNNER_LABEL and PR labels (self-hosted, ubuntu-latest-4-cores). A/B is a one-line vars change. Jest workers scale with cores (maxWorkers defaults to cores−1) and synth is CPU-bound, so more cores helps both — but it's a per-run $ cost, so weigh against Docs: specify that you can't use the agent with the canned repo #2 (which uses concurrency you're already paying for).
feat: add Iteration 3e for memory security and integrity (OWASP ASI06) #5 has the same required-check constraint as Docs: specify that you can't use the agent with the canned repo #2. Because build must report on merge_group, you can't simply skip the job for docs-only PRs (the required check would never report → queue deadlock). Instead keep the job and make the expensive steps conditional (e.g. dorny/paths-filter gates //cdk:test), emitting success when CDK paths are untouched. Always log()/annotate what was skipped so a skipped suite doesn't read as "covered."
Summary
Umbrella tracking issue for the remaining CI build-performance levers identified during the profiling that produced #357. Item #1 of the original 5 recommendations is done (PR #359 — skip the redundant jest type-check via
isolatedModules): build step 710s → 346s (−51%),//cdk:test649s → 298s. This issue tracks the other four.Each is its own measurable experiment; this issue holds the sequence, rationale, and implementer notes. Spin out a child issue per item when work starts (governance per ADR-003).
Recommendations — status & remaining work
@swc/jest")//cdk:test649s → 298sjest --shard=N/M)collectCoveragetomerge_grouponly (skip on PR push)dorny/paths-filter) so docs/CLI/agent-only PRs skip//cdk:testSuggested sequencing
Insights for implementers
The cost shape shifted after Installation: docker image inspect fails #1.
//cdk:testwas ~91% of the build step; at 298s it's now ~86% of a 346s step — still the long pole, but half the absolute size. The mise parallel DAG cannot overlap it (everything else finishes in the first ~90s), so the next dollar of speedup must come from parallelizing the suite itself (Docs: specify that you can't use the agent with the canned repo #2) or not running it when irrelevant (feat: add Iteration 3e for memory security and integrity (OWASP ASI06) #5) — not from the DAG.Docs: specify that you can't use the agent with the canned repo #2 has a merge-queue wrinkle.
buildis a required status check and must report on themerge_groupevent (see feat(ci): require secrets/deps/SAST on every PR — make the merge gate enforceable (incident #313 class) #327 / the comment block atopbuild.yml). If you split CDK tests into a matrix, the required check must aggregate shard results — either keep onebuildjob that runs shards internally, or add a "tests passed" gate job thatneeds:all shards and is the thing marked required. Do not mark individual shard jobs required, or the queue can deadlock/flake. Shard count should divide cleanly into the ~113 suites; 4-way is a sane start. Watch for per-shard fixed overhead (checkout, install, synth-cache restore) eroding the win — cache aggressively and measure wall-clock, not sum-of-shards.Docs: user guide hard-codes us-east-1 #3 must preserve the enforcement gate. Coverage thresholds are the merge gate (
coverageThresholdincdk/package.json/cli/package.json+agentpytestfail_under, kept in sync viacontracts/coverage-thresholds.jsonandcheck:coverage-thresholds-sync). If you skipcollectCoverageonpull_request, you MUST still enforce thresholds onmerge_groupso nothing merges under-covered. Net effect is mostly on the high-frequency PR-push event, so quantify it there.feat: add FargateAgentStack as alternative compute backend #4 is the cheapest experiment but recurring cost.
build.ymlalready resolves the runner viavars.DEFAULT_RUNNER_LABELand PR labels (self-hosted,ubuntu-latest-4-cores). A/B is a one-linevarschange. Jest workers scale with cores (maxWorkersdefaults to cores−1) and synth is CPU-bound, so more cores helps both — but it's a per-run $ cost, so weigh against Docs: specify that you can't use the agent with the canned repo #2 (which uses concurrency you're already paying for).feat: add Iteration 3e for memory security and integrity (OWASP ASI06) #5 has the same required-check constraint as Docs: specify that you can't use the agent with the canned repo #2. Because
buildmust report onmerge_group, you can't simply skip the job for docs-only PRs (the required check would never report → queue deadlock). Instead keep the job and make the expensive steps conditional (e.g.dorny/paths-filtergates//cdk:test), emitting success when CDK paths are untouched. Alwayslog()/annotate what was skipped so a skipped suite doesn't read as "covered."Definition of done (this umbrella)
docs/design/CI_BUILD_PERFORMANCE.mdkept current as items land (see linked docs PR).🤖 Generated with Claude Code