feat(cli): standalone per-shard e2e launcher#14938
Draft
sarayev wants to merge 2 commits into
Draft
Conversation
Add scripts/run-e2e-standalone.ts (yarn cloud-e2e-standalone) that runs each e2e shard from e2e_workflow_generated.yml as an individual CodeBuild build via StartBuild instead of StartBuildBatch. Every build has buildBatchArn null, so it fully bypasses the batch orchestrator, which becomes unreliable when too many child builds are simultaneously in-progress. Because there is no batch, the batch orchestrator's simultaneous-in-progress limit does not apply; the only ceiling is the account concurrency quota (Linux/Medium 1200, Windows/Medium 300). Linux and Windows shards run as two independent, parallel concurrency pools with separate caps (--max-concurrency-linux / --max-concurrency-windows, default 75 each; --max-concurrency sets both). Each shard reuses the prep S3 cache keyed on the resolved commit SHA (--source-sha), injects TEST_SUITE and CLI_REGION (round-robin from AWS_REGIONS_TO_RUN_TESTS), and applies Windows image/environment-type overrides resolved from the project environment. Builds are polled with BatchGetBuilds; failures can be retried with --retry-failed. Validated with a small run of 20 Linux shards against a cached prep from a prior batch: all builds started with buildBatchArn null, reused the prep S3 cache, had CLI_REGION set with no select-region error, and proceeded into the BUILD phase executing jest. --- Prompt: Build a standalone per-shard e2e launcher that runs shards as individual CodeBuild builds with per-platform concurrency caps, bypassing the batch orchestrator, and validate it with a small Linux run.
Add `--durations <path>` (default .e2e-shard-durations.json) and `--order longest-first|shortest-first|file` (default longest-first) to the standalone launcher. When real per-shard durations from a prior green batch are available, the launch queue is sorted by duration descending (LPT heuristic) so the longest shards — the 8 l_gen2_migration_* shards at 117-194 min — start in the first pool slots and never become the makespan tail. Shards missing from the dataset assume the median; if the file is absent, ordering falls back to file order with no hard dependency. The dataset file itself is not committed. Tested by launching 20 linux shards longest-first against a cached prep from a prior batch: all 8 gen2-migration shards launched in the first slots, builds had buildBatchArn null, reused the S3 prep cache, had CLI_REGION set, and reached the BUILD phase running tests, never exceeding the concurrency cap. --- Prompt: FOLLOW-UP ENHANCEMENT: wire real duration ordering into the launcher. A per-shard duration dataset exists at .e2e-shard-durations.json (untracked) with shard_durations (identifier -> minutes). Add --durations and --order (default longest-first); sort the launch queue by duration DESC by default (LPT) so the 8 l_gen2_migration_* shards (117-194 min) start first. Shards missing -> median. If the file is absent, fall back to file order. Note in the PR body that this dataset should later replace the stale 2022 scripts/cci-test-timings.data.json (do NOT modify it here). Commit as a follow-up, conventional message, no --no-verify.
8cf87d5 to
786b464
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of changes
The e2e batch orchestrator (
StartBuildBatch) becomes unreliable when a largenumber of child builds are simultaneously in-progress. To run e2e reliably
without depending on the batch orchestrator, this adds a standalone per-shard
launcher that runs each e2e shard as an individual
StartBuild(notStartBuildBatch).Every build it starts has
buildBatchArn === null, so the batch orchestrator isfully bypassed. Builds reuse the existing prep S3 cache keyed on the resolved
commit SHA, so no rebuild is required.
Standalone launcher
scripts/run-e2e-standalone.ts(+yarn cloud-e2e-standalone) parses the shardlist from
e2e_workflow_generated.yml, assigns each shard aCLI_REGIONround-robin from
AWS_REGIONS_TO_RUN_TESTS, and runs Linux and Windows as twoindependent concurrency-capped pools in parallel. Each build injects
TEST_SUITEandCLI_REGIONoverrides (the latter short-circuitsselect-region-for-e2e-test.ts); Windows shards additionally get theWINDOWS_IMAGE_2019image andWINDOWS_SERVER_2022_CONTAINERenv-typeoverrides. The pool keeps at most
--max-concurrencybuilds in flight, pollingBatchGetBuildsand topping up as builds finish; it prints a per-shard summary,supports
--retry-failed, and exits non-zero on any failure.Because every build bypasses the batch orchestrator, the real ceiling is the
account concurrency quota (Linux/Medium 1,200; Windows/Medium 300), so
per-platform caps can be set well above the batch orchestrator's safe limit.
Duration-aware launch ordering
--durations <path>(default.e2e-shard-durations.json) and--order longest-first|shortest-first|file(default longest-first) orderthe launch queue by real per-shard wall-clock durations from a prior green
batch (LPT heuristic). The longest shards — the 8
l_gen2_migration_*shardsat 117–194 min — start in the first pool slots so they are never the makespan
tail. Shards missing from the dataset assume the median; if the file is absent,
ordering falls back to file order (no hard dependency). The dataset itself is
not committed.
Follow-up note for reviewers: this same duration dataset should later replace
the stale 2022
scripts/cci-test-timings.data.jsonused for split-balancing —not changed in this PR.
How did you test these changes?
Validated a small bounded run against a cached prep from a prior batch:
Confirmed on live builds:
buildBatchArn === null— batch orchestrator fully bypassed.prefix (
{repo,.cache,verdaccio-cache,all-binaries}).CLI_REGIONset per shard (round-robin) — noselect-regionerror.TEST_SUITEset to the shard's jest filter; builds reached the BUILD phaserunning tests.
l_gen2_migration_*shards launchedin the first slots.
Run the full suite when ready with, e.g.:
Projected makespan at cap-75 longest-first ≈ prep (~30m) + longest shard
(~194m) ≈ ~3.7h.
Checklist
yarn testpassesBy submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license.