Skip to content

feat(cli): standalone per-shard e2e launcher#14938

Draft
sarayev wants to merge 2 commits into
devfrom
feat/e2e-standalone-launcher
Draft

feat(cli): standalone per-shard e2e launcher#14938
sarayev wants to merge 2 commits into
devfrom
feat/e2e-standalone-launcher

Conversation

@sarayev

@sarayev sarayev commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Description of changes

The e2e batch orchestrator (StartBuildBatch) becomes unreliable when a large
number of child builds are simultaneously in-progress. To run e2e reliably
without depending on the batch orchestrator, this adds a standalone per-shard
launcher that runs each e2e shard as an individual StartBuild (not
StartBuildBatch).

Every build it starts has buildBatchArn === null, so the batch orchestrator is
fully bypassed. Builds reuse the existing prep S3 cache keyed on the resolved
commit SHA, so no rebuild is required.

Standalone launcher

scripts/run-e2e-standalone.ts (+ yarn cloud-e2e-standalone) parses the shard
list from e2e_workflow_generated.yml, assigns each shard a CLI_REGION
round-robin from AWS_REGIONS_TO_RUN_TESTS, and runs Linux and Windows as two
independent concurrency-capped pools in parallel. Each build injects
TEST_SUITE and CLI_REGION overrides (the latter short-circuits
select-region-for-e2e-test.ts); Windows shards additionally get the
WINDOWS_IMAGE_2019 image and WINDOWS_SERVER_2022_CONTAINER env-type
overrides. The pool keeps at most --max-concurrency builds in flight, polling
BatchGetBuilds and topping up as builds finish; it prints a per-shard summary,
supports --retry-failed, and exits non-zero on any failure.

Because every build bypasses the batch orchestrator, the real ceiling is the
account concurrency quota (Linux/Medium 1,200; Windows/Medium 300), so
per-platform caps can be set well above the batch orchestrator's safe limit.

Duration-aware launch ordering

--durations <path> (default .e2e-shard-durations.json) and
--order longest-first|shortest-first|file (default longest-first) order
the launch queue by real per-shard wall-clock durations from a prior green
batch (LPT heuristic). The longest shards — the 8 l_gen2_migration_* shards
at 117–194 min — start in the first pool slots so they are never the makespan
tail. Shards missing from the dataset assume the median; if the file is absent,
ordering falls back to file order (no hard dependency). The dataset itself is
not committed.

Follow-up note for reviewers: this same duration dataset should later replace
the stale 2022 scripts/cci-test-timings.data.json used for split-balancing —
not changed in this PR.

How did you test these changes?

Validated a small bounded run against a cached prep from a prior batch:

yarn cloud-e2e-standalone --source-sha <resolvedSHA> \
  --platform linux --max-concurrency 25 --limit 20 --order longest-first

Confirmed on live builds:

  • buildBatchArn === null — batch orchestrator fully bypassed.
  • Prep cache reused: builds load the prep artifacts from the resolved-SHA S3
    prefix ({repo,.cache,verdaccio-cache,all-binaries}).
  • CLI_REGION set per shard (round-robin) — no select-region error.
  • TEST_SUITE set to the shard's jest filter; builds reached the BUILD phase
    running tests.
  • Simultaneous in-progress never exceeded the cap.
  • Longest-first ordering verified: all 8 l_gen2_migration_* shards launched
    in the first slots.

Run the full suite when ready with, e.g.:

yarn cloud-e2e-standalone --source-sha <resolvedSHA> --max-concurrency 75

Projected makespan at cap-75 longest-first ≈ prep (~30m) + longest shard
(~194m) ≈ ~3.7h.

Checklist

  • PR description included
  • yarn test passes
  • Tests are [changed or added]
  • Relevant documentation is changed or added

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license.

@sarayev sarayev changed the title feat(cli): standalone per-shard e2e launcher (cap-25 orchestrator bypass) feat(cli): standalone per-shard e2e launcher Jun 23, 2026
sarayev added 2 commits June 23, 2026 13:00
Add scripts/run-e2e-standalone.ts (yarn cloud-e2e-standalone) that runs
each e2e shard from e2e_workflow_generated.yml as an individual CodeBuild
build via StartBuild instead of StartBuildBatch. Every build has
buildBatchArn null, so it fully bypasses the batch orchestrator, which
becomes unreliable when too many child builds are simultaneously
in-progress.

Because there is no batch, the batch orchestrator's simultaneous-in-progress
limit does not apply; the only ceiling is the account concurrency quota
(Linux/Medium 1200, Windows/Medium 300). Linux and Windows shards run as
two independent, parallel concurrency pools with separate caps
(--max-concurrency-linux / --max-concurrency-windows, default 75 each;
--max-concurrency sets both). Each shard reuses the prep S3 cache keyed on
the resolved commit SHA (--source-sha), injects TEST_SUITE and CLI_REGION
(round-robin from AWS_REGIONS_TO_RUN_TESTS), and applies Windows
image/environment-type overrides resolved from the project environment.
Builds are polled with BatchGetBuilds; failures can be retried with
--retry-failed.

Validated with a small run of 20 Linux shards against a cached prep from a
prior batch: all builds started with buildBatchArn null, reused the prep S3
cache, had CLI_REGION set with no select-region error, and proceeded into
the BUILD phase executing jest.

---
Prompt: Build a standalone per-shard e2e launcher that runs shards as
individual CodeBuild builds with per-platform concurrency caps, bypassing
the batch orchestrator, and validate it with a small Linux run.
Add `--durations <path>` (default .e2e-shard-durations.json) and `--order
longest-first|shortest-first|file` (default longest-first) to the standalone
launcher. When real per-shard durations from a prior green batch are available,
the launch queue is sorted by duration descending (LPT heuristic) so the
longest shards — the 8 l_gen2_migration_* shards at 117-194 min — start in the
first pool slots and never become the makespan tail. Shards missing from the
dataset assume the median; if the file is absent, ordering falls back to file
order with no hard dependency. The dataset file itself is not committed.

Tested by launching 20 linux shards longest-first against a cached prep from a
prior batch: all 8 gen2-migration shards launched in the first slots, builds
had buildBatchArn null, reused the S3 prep cache, had CLI_REGION set, and
reached the BUILD phase running tests, never exceeding the concurrency cap.

---
Prompt: FOLLOW-UP ENHANCEMENT: wire real duration ordering into the launcher.
A per-shard duration dataset exists at .e2e-shard-durations.json (untracked)
with shard_durations (identifier -> minutes). Add --durations and --order
(default longest-first); sort the launch queue by duration DESC by default
(LPT) so the 8 l_gen2_migration_* shards (117-194 min) start first. Shards
missing -> median. If the file is absent, fall back to file order. Note in the
PR body that this dataset should later replace the stale 2022
scripts/cci-test-timings.data.json (do NOT modify it here). Commit as a
follow-up, conventional message, no --no-verify.
@sarayev sarayev force-pushed the feat/e2e-standalone-launcher branch from 8cf87d5 to 786b464 Compare June 23, 2026 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant