From 6562529d69bf31e33b4ef9fb5736a97bb21438bc Mon Sep 17 00:00:00 2001 From: Harold Sun Date: Tue, 2 Jun 2026 02:08:27 +0000 Subject: [PATCH 1/9] docs(designs): golden templates and semver check PR 0 of the golden-templates rollout. Documents the corpus, the in-process harness, the daily integration tier, and the semver gate that requires a major-version bump when an existing pinned output changes. After #8637 shipped, four output-shape regressions slipped past test coverage (#9004, #9005, #9027, #9029) before #9033 made local LE processing opt-in. The design pins the expanded output of representative templates so future regressions surface in code review rather than after release. No code changes; review and discussion happen on this PR before infrastructure lands in PR 1. --- designs/golden_templates_and_semver_check.md | 426 +++++++++++++++++++ 1 file changed, 426 insertions(+) create mode 100644 designs/golden_templates_and_semver_check.md diff --git a/designs/golden_templates_and_semver_check.md b/designs/golden_templates_and_semver_check.md new file mode 100644 index 00000000000..6a287b92ec9 --- /dev/null +++ b/designs/golden_templates_and_semver_check.md @@ -0,0 +1,426 @@ +Title: Golden Templates and Semver Check +======================================== + +What is the problem? +-------------------- + +PR #8637 (sam-cli 1.160.0) shipped in-tree CloudFormation Language Extensions +(LE) support. Within weeks, four production regressions surfaced — every one +a behavior shift in the *expanded-template output* that no automated test +caught before release: + +- #9004 — `Fn::FindInMap` / `Join` / `Select` / `Base64` raised on unresolved + parameter `Ref`s in PARTIAL mode (fixed in #9010). +- #9005 — `PACKAGEABLE_RESOURCE_ARTIFACT_PROPERTIES` had drifted from canonical + resource lists; `AWS::Serverless::Application` and seven sibling types + failed packaging (fixed in #9009). +- #9027 — `AWS::Include` buried inside `Fn::ToJsonString` was invisible to the + global-transform walker because LE expansion ran first (fixed in #9030). +- #9029 — LE-aware build merge overwrote `Code: { ZipFile: !Sub ... }` with + the LE-resolved value, baking pseudo-param defaults into the built template + (fixed in #9031). + +PR #9033 made local LE processing opt-in to give customers a clean migration +path. The action item from that retrospective: a golden-template corpus that +pins the expanded output of representative templates, plus a CI gate that +requires a major-version bump whenever a pin changes. + +What will be changed? +--------------------- + +Add a new `tests/golden/` corpus and harness, a new daily integration tier +under `tests/integration/golden_templates/`, and a tiny standalone GitHub +workflow that enforces the semver rule. The corpus covers both LE and +non-LE templates across four axes: SAM resources, packageable CFN resources, +LE intrinsics, and cross-cutting features. + +Success criteria for the change +------------------------------- + +1. Every PR runs the corpus through an in-process build + package pipeline + and fails on any byte-exact diff against pinned `expected.{build,package}.yaml`. +2. The daily integration-tests workflow runs the same corpus through real + `sam` subprocess invocations and structurally compares the output. +3. Modifying or deleting an existing pin in a PR fails CI unless the same PR + bumps the major version in `samcli/__init__.py`. Adding new pins is + ungated. +4. A single regeneration script (`update_goldens.py`) is the canonical way + to re-pin; the regenerator and the verifier share the harness so they + cannot diverge. +5. The corpus is built incrementally as a series of independently-reviewable + PRs, each ≤1000 LOC. + +Out-of-Scope +------------ + +- No deploy step. The integration tier stops at packaged-template output + — no CloudFormation, no real Lambdas. +- No CFN compatibility check (e.g. `cfn-lint`, `ValidateTemplate`) at this + stage. The corpus pins what *SAM-CLI* produces; CFN-side parity is a + separate exercise. +- No coverage of `sam local` runtime invocation paths. The bugs we're + guarding against are template-shape bugs, not runtime bugs. +- No reorganization of existing + `tests/integration/testdata/buildcmd/language-extensions-*` / + `tests/integration/testdata/package/language-extensions-*` directories. + The new corpus lives separately. + +User Experience Walkthrough +--------------------------- + +A SAM-CLI contributor authoring a behavior-affecting change runs the existing +test suite and discovers one or more golden mismatches: + +``` +$ make test-all +... +tests/golden/test_corpus.py::test_build_output_matches_golden[language_extensions/foreach_static] FAILED + +Golden mismatch in language_extensions/foreach_static. +To inspect: python tests/golden/update_goldens.py --diff --filter 'language_extensions/foreach_static' +To re-pin: python tests/golden/update_goldens.py --filter 'language_extensions/foreach_static' +If intentional: + - new case (added expected.*.yaml) -> no version bump at PR time + - modified/deleted existing expected.* -> bump major in samcli/__init__.py +``` + +The contributor inspects the diff: + +``` +$ python tests/golden/update_goldens.py --diff --filter 'language_extensions/foreach_static' +--- a/language_extensions/foreach_static/expected.build.yaml ++++ b/language_extensions/foreach_static/expected.build.yaml +@@ -... +- Code: <> ++ Code: ++ ZipFile: ... +``` + +If the diff is intentional, the contributor re-pins, bumps the major version, +and ships. If unintentional, they fix their code and the test passes again. + +Authoring a new case follows the same flow with `--new`: + +``` +$ python tests/golden/update_goldens.py --new --filter 'sam_resources/' +``` + +Implementation +============== + +CLI Changes +----------- + +No `sam` CLI changes. New developer-facing CLIs: + +- `python tests/golden/update_goldens.py [--filter GLOB] [--check | --diff | --new]` +- `python tests/golden/check_semver_bump.py --base REF --head REF` + +### Breaking Change + +None. + +Design +------ + +### Repository layout + +``` +tests/golden/ # NEW +├── conftest.py # parametrize over corpus dir +├── harness.py # in-process build + package + diff +├── normalize.py # deterministic YAML rendering +├── update_goldens.py # regeneration CLI +├── check_semver_bump.py # semver gate +├── README.md # author workflow +└── templates/ + ├── sam_resources//... # SAM resource axis + ├── packageable_resources//... # CFN resource axis + ├── language_extensions//... # LE axis + └── cross_cutting//... # nested stacks, AWS::Include, etc. + +tests/integration/golden_templates/ # NEW +├── test_golden_subprocess.py # imports tests/golden/templates as data +└── structural_compare.py # looser comparator for real-CLI output +``` + +The `tests/golden/templates/` directory is the single source of truth for +the corpus. Both the unit tier (`tests/golden/`) and the integration tier +(`tests/integration/golden_templates/`) read from it; nothing is duplicated. + +### Per-case directory layout + +``` +templates/// +├── template.yaml # input +├── metadata.yaml # language_extensions, description, issue_refs +├── expected.build.yaml # pinned post-build template +├── expected.package.yaml # pinned post-package template +├── README.md # what this case covers and why +└── src/ # any source files referenced by CodeUri etc. +``` + +### The harness + +Three functions, used by both pytest and `update_goldens.py`: + +**`run_build_pipeline(template_path, language_extensions: bool) -> dict`** +mirrors `sam build`: + +1. Read `template.yaml`. +2. Call `expand_language_extensions(template, enabled=language_extensions)` + (the function PR #9033 made explicit). +3. Run the SAM transform. +4. Walk artifact properties using the JMESPath-aware machinery from PR #9009 + and replace any local artifact path with `<>`. +5. Apply the LE-aware merge with a stub artifact-set that mirrors what + `ApplicationBuilder` would have produced (the three sites fixed in #9031). + +**`run_package_pipeline(template_path, build_output) -> dict`** +mirrors `sam package`: + +1. Run `_export_global_artifacts_pass` on the original template (the pre-LE + AWS::Include pass added in PR #9030). +2. Walk packageable artifact properties and replace each local path with + `s3://golden-bucket/` — deterministic, content-addressed. +3. Re-run `_export_global_artifacts_pass` to catch AWS::Include nested in + the LE-expanded template. + +S3 is mocked via a thin `GoldenS3Uploader` class — no `moto`, no network, +no `boto3` stubbing. + +**`normalize(template) -> str`** renders deterministic YAML: + +- Sort `Resources` and `Mappings` keys alphabetically. +- Drop volatile `Metadata.SamTransformMetrics`. +- `yaml.safe_dump(..., sort_keys=True, default_flow_style=False)`. +- Single trailing newline. + +The harness invokes the same high-level functions the CLI invokes; it does +not re-implement the pipeline. The integration tier verifies that contract +holds. + +### Pytest collection + +A single parametrize collects every case directory: + +```python +def pytest_generate_tests(metafunc): + if "golden_case" in metafunc.fixturenames: + cases = sorted(p.parent for p in TEMPLATES_ROOT.rglob("template.yaml")) + metafunc.parametrize( + "golden_case", + cases, + ids=lambda p: str(p.relative_to(TEMPLATES_ROOT)), + ) +``` + +Each case becomes one parametrized test ID like +`language_extensions/foreach_static`, so `pytest -k foreach_static` works. + +Two tests per case, one for build output and one for package output. Both +diff against the pinned file and emit the actionable hint shown in the UX +walkthrough on mismatch. + +### Per-case `metadata.yaml` + +Optional, defaults applied if absent: + +```yaml +language_extensions: true +description: "Repro for #9027 — AWS::Include inside Fn::ToJsonString" +issue_refs: [9027] +``` + +Default for `language_extensions`: `true` for cases under +`templates/language_extensions/`, `false` elsewhere — matching the +post-#9033 opt-in default. + +### The regeneration script + +Single CLI, thin wrapper around the same harness functions the tests use: + +- *(default, no args)* — regenerate every case. +- `--filter ` — regenerate only matching cases. +- `--check` — dry-run; exit non-zero if anything would change. CI mode. +- `--diff` — like `--check`, also print unified diff of each would-be + change. +- `--new` — for cases with `template.yaml` but no `expected.*` yet, + generate both pinned outputs from scratch. + +By construction the regenerator and the verifier share the harness, so they +can never diverge. + +### The semver gate + +Tiny standalone GitHub workflow at +`.github/workflows/golden-semver-gate.yml`. Fires only when corpus pins or +the version file change: + +```yaml +on: + pull_request: + branches: [develop, "feat/*", "feat-*"] + paths: + - "tests/golden/templates/**/expected.*.yaml" + - "samcli/__init__.py" +``` + +`check_semver_bump.py` rules: + +1. Diff merge base vs HEAD; collect changes to + `tests/golden/templates/**/expected.*.yaml`. +2. Classify each: + - **Added** — no version requirement at PR time. + - **Modified** or **Deleted** — requires major bump in this same PR. +3. Read `__version__` from `samcli/__init__.py` on base and head. +4. If any modification or deletion exists, require `head.major > base.major`. +5. On failure, print the changed goldens, current vs. required version, and + the suggested new version string. + +A rename (delete-old + add-new) is treated as one deletion + one addition → +requires a major bump. Conservative on purpose: a rename usually means +reorganization but could mask a content edit, so we make the human +acknowledge it. + +#### Why additions don't require a per-PR minor bump + +The project's existing release process bumps the version in a dedicated +release PR (e.g. #9051 `chore: bump version to 1.161.1`). Forcing a minor +bump on every corpus-addition PR would either churn that release process +or require funnelling the corpus rollout through a single feature branch. +Neither is worth the friction. The gate enforces the only invariant that +actually matters at PR time: *a behavior change to an existing pin must +ship as a major version*. New cases simply broaden coverage and have no +backwards-compatibility implication. + +### Per-PR CI wiring + +`make pr` already invokes `pytest tests/unit/...`. We extend the `test` and +`test-all` Makefile targets to include `tests/golden/` as a separate pytest +invocation (so it doesn't enter the 94% coverage calculation against +`samcli/`). The harness is in-process, deterministic, and offline — +estimated <30s for ~50 cases. The existing Windows `TEMP=D:\Temp` shim +already handles temp-dir contention. No new workflow file (the semver gate +above is its own workflow, but the corpus tests ride on `build.yml`). + +### Integration tier + +Daily run on the existing `integration-tests.yml` matrix (one new entry: +`golden-templates`). Per case: + +1. Copy case dir to a temp working dir; copy `src/`. +2. `sam build [--language-extensions]` subprocess; assert exit 0; read + `.aws-sam/build/template.yaml`. +3. `sam package --s3-bucket --output-template-file packaged.yaml` + subprocess; assert exit 0; read `packaged.yaml`. +4. Structurally compare against the pinned `expected.{build,package}.yaml`: + - Same set of resource logical IDs. + - Same `Type` per ID. + - Same set of property keys at each path. + - Same `Outputs` and `Mappings` keys. + - For LE-relevant artifact properties (`Code`, `CodeUri`, `ContentUri`, + `DefinitionUri`, `Location`, `Command.ScriptLocation`, etc.) flag + dict-vs-string differences (the #9029 bug class). + +The comparator lives in +`tests/integration/golden_templates/structural_compare.py` and is reused +for both build and package outputs. + +#### Why structural, not byte-exact + +Real `sam build` writes timestamps, real S3 URIs, real artifact hashes. +Byte-exact pinning at this tier would either require the same normalization +the in-process tier uses (defeats the purpose — we want to see the real +output) or constant churn from environmental noise. Structural comparison +answers "did the user-visible output shape change?" — which is what catches +the four bugs we just fixed. + +If the unit tier passes but the integration tier fails for a case, the +in-process harness has drifted from the CLI surface — investigation, not +a re-pin. + +### Wave plan + +Each PR is independently reviewable; LOC estimates are upper-bound net diff. + +- **PR 0** (this design doc). +- **PR 1** — Harness skeleton, three sentinel cases, semver gate workflow. +- **PR 2** — Integration runner. +- **PR 3** — SAM resources axis: Function variants (~10 cases). +- **PR 4** — SAM resources axis: Api / HttpApi / StateMachine / others + (~10 cases). +- **PR 5** — Packageable CFN resources axis (~12 cases). One per entry in + `RESOURCES_WITH_LOCAL_PATHS` + `RESOURCES_WITH_IMAGE_COMPONENT` not + already covered. Catches #9009-class drift; locks every dotted artifact + path. +- **PR 6** — LE intrinsics: `Fn::ForEach` (~10 cases). +- **PR 7** — LE intrinsics: non-ForEach + bug repros (~10 cases). Each + metadata.yaml carries the issue ref. +- **PR 8** — Cross-cutting: nested stacks, Globals, Mappings, Conditions + (~10 cases). +- **PR 9** — Cross-cutting: AWS::Include, SAR Metadata, Unicode (~7 cases). + Final corpus README listing the full coverage matrix. + +#### Wave invariants + +- PRs 2–9 are additions only under `tests/golden/templates/`, so the + semver gate passes without any version bump. +- Every case carries `metadata.yaml` with `description` and `issue_refs`. +- No PR after PR 1 modifies harness behavior unless a wave reveals a + missing capability; if so, the harness change is its own follow-up PR. + +`samconfig.toml` Changes +---------------- + +None. + +Security Considerations +----------------------- + +The harness intentionally invokes only in-process SAM-CLI code; no +network calls, no credentials, no AWS API access. The integration tier +uses the existing integration-tests workflow's IAM role and S3 bucket; +no new permissions required. + +Documentation Changes +--------------------- + +`tests/golden/README.md` — authoring guide for contributors adding new +cases or re-pinning existing ones. + +Alternatives considered +======================= + +Subprocess-only harness (no in-process) +--------------------------------------- + +Every golden runs the real `sam` CLI as a subprocess. Rejected: process +startup × N templates is too slow for a per-PR gate; flakier on Windows; +deterministic package output without `moto` is hard. Subprocess invocation +is kept, but only at the daily integration tier where speed and +determinism matter less. + +Snapshot plugin (`syrupy`, `pytest-snapshot`) +--------------------------------------------- + +Less harness code to write. Rejected: opaque snapshot format hurts +code-review readability; update flow is plugin-specific; corpus growth +across many PRs is harder when snapshots are plugin-managed rather than +plain YAML files in the case directory. + +CFN-side validation (`cfn-lint` or `ValidateTemplate`) +------------------------------------------------------ + +Would catch SAM-CLI vs CFN divergence. Rejected for now: out of scope for +this design; can be added as a separate compatibility tier later. The +current bugs are SAM-CLI's own output-shape bugs, not CFN-divergence bugs. + +Per-PR minor bump on every corpus-addition PR +--------------------------------------------- + +Initially considered. Rejected: would either churn the existing +release-PR process (e.g. #9051) or require funnelling the corpus rollout +through a single feature branch. The gate's only PR-time invariant is +that *behavior changes* require major bumps; coverage growth is +backwards-compatible by definition. From e0280c2bbfe450039150cccfcb835e107c8adc94 Mon Sep 17 00:00:00 2001 From: Harold Sun Date: Tue, 2 Jun 2026 02:14:00 +0000 Subject: [PATCH 2/9] test(golden): add sentinel template directories Three sentinel cases (SAM, CFN, LE-foreach) used by the harness in the next commit. Each case carries metadata.yaml + README.md + src/app.py. Pinned outputs (expected.*.yaml) are generated by update_goldens.py once the harness is in place. --- tests/golden/__init__.py | 0 .../foreach_static_zip/README.md | 7 +++++++ .../foreach_static_zip/metadata.yaml | 3 +++ .../foreach_static_zip/src/app.py | 2 ++ .../foreach_static_zip/template.yaml | 16 ++++++++++++++++ .../lambda_function_zip/README.md | 5 +++++ .../lambda_function_zip/metadata.yaml | 3 +++ .../lambda_function_zip/src/app.py | 2 ++ .../lambda_function_zip/template.yaml | 12 ++++++++++++ .../serverless_function_zip/README.md | 6 ++++++ .../serverless_function_zip/metadata.yaml | 3 +++ .../serverless_function_zip/src/app.py | 2 ++ .../serverless_function_zip/template.yaml | 11 +++++++++++ 13 files changed, 72 insertions(+) create mode 100644 tests/golden/__init__.py create mode 100644 tests/golden/templates/language_extensions/foreach_static_zip/README.md create mode 100644 tests/golden/templates/language_extensions/foreach_static_zip/metadata.yaml create mode 100644 tests/golden/templates/language_extensions/foreach_static_zip/src/app.py create mode 100644 tests/golden/templates/language_extensions/foreach_static_zip/template.yaml create mode 100644 tests/golden/templates/packageable_resources/lambda_function_zip/README.md create mode 100644 tests/golden/templates/packageable_resources/lambda_function_zip/metadata.yaml create mode 100644 tests/golden/templates/packageable_resources/lambda_function_zip/src/app.py create mode 100644 tests/golden/templates/packageable_resources/lambda_function_zip/template.yaml create mode 100644 tests/golden/templates/sam_resources/serverless_function_zip/README.md create mode 100644 tests/golden/templates/sam_resources/serverless_function_zip/metadata.yaml create mode 100644 tests/golden/templates/sam_resources/serverless_function_zip/src/app.py create mode 100644 tests/golden/templates/sam_resources/serverless_function_zip/template.yaml diff --git a/tests/golden/__init__.py b/tests/golden/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/golden/templates/language_extensions/foreach_static_zip/README.md b/tests/golden/templates/language_extensions/foreach_static_zip/README.md new file mode 100644 index 00000000000..8c6bec56ccc --- /dev/null +++ b/tests/golden/templates/language_extensions/foreach_static_zip/README.md @@ -0,0 +1,7 @@ +# foreach_static_zip + +Sentinel LE case. `Fn::ForEach` over a static collection with all expanded +functions sharing the same `CodeUri`. Locks the expanded-template shape +(`AlphaFunction`, `BetaFunction` inlined as siblings, ForEach key gone) +and the post-package shape (each function's `CodeUri` rewritten to the +same `s3://golden-bucket/`). diff --git a/tests/golden/templates/language_extensions/foreach_static_zip/metadata.yaml b/tests/golden/templates/language_extensions/foreach_static_zip/metadata.yaml new file mode 100644 index 00000000000..e2113d53a40 --- /dev/null +++ b/tests/golden/templates/language_extensions/foreach_static_zip/metadata.yaml @@ -0,0 +1,3 @@ +language_extensions: true +description: "Sentinel: Fn::ForEach over static collection, shared CodeUri." +issue_refs: [] diff --git a/tests/golden/templates/language_extensions/foreach_static_zip/src/app.py b/tests/golden/templates/language_extensions/foreach_static_zip/src/app.py new file mode 100644 index 00000000000..29193508dc0 --- /dev/null +++ b/tests/golden/templates/language_extensions/foreach_static_zip/src/app.py @@ -0,0 +1,2 @@ +def handler(event, context): + return {"ok": True} diff --git a/tests/golden/templates/language_extensions/foreach_static_zip/template.yaml b/tests/golden/templates/language_extensions/foreach_static_zip/template.yaml new file mode 100644 index 00000000000..9b4969402b8 --- /dev/null +++ b/tests/golden/templates/language_extensions/foreach_static_zip/template.yaml @@ -0,0 +1,16 @@ +AWSTemplateFormatVersion: '2010-09-09' +Transform: + - AWS::LanguageExtensions + - AWS::Serverless-2016-10-31 + +Resources: + Fn::ForEach::Workers: + - WorkerName + - - Alpha + - Beta + - ${WorkerName}Function: + Type: AWS::Serverless::Function + Properties: + Handler: app.handler + Runtime: python3.11 + CodeUri: ./src/ diff --git a/tests/golden/templates/packageable_resources/lambda_function_zip/README.md b/tests/golden/templates/packageable_resources/lambda_function_zip/README.md new file mode 100644 index 00000000000..c71fa401dd1 --- /dev/null +++ b/tests/golden/templates/packageable_resources/lambda_function_zip/README.md @@ -0,0 +1,5 @@ +# lambda_function_zip + +Sentinel non-SAM case. Raw `AWS::Lambda::Function` with `Code: ./src/`. Locks +the post-package output shape after the CFN-resource-list packageable walker +rewrites `Code` to `s3://golden-bucket/`. diff --git a/tests/golden/templates/packageable_resources/lambda_function_zip/metadata.yaml b/tests/golden/templates/packageable_resources/lambda_function_zip/metadata.yaml new file mode 100644 index 00000000000..c9bdf896efe --- /dev/null +++ b/tests/golden/templates/packageable_resources/lambda_function_zip/metadata.yaml @@ -0,0 +1,3 @@ +language_extensions: false +description: "Sentinel: raw AWS::Lambda::Function with local Code path." +issue_refs: [] diff --git a/tests/golden/templates/packageable_resources/lambda_function_zip/src/app.py b/tests/golden/templates/packageable_resources/lambda_function_zip/src/app.py new file mode 100644 index 00000000000..29193508dc0 --- /dev/null +++ b/tests/golden/templates/packageable_resources/lambda_function_zip/src/app.py @@ -0,0 +1,2 @@ +def handler(event, context): + return {"ok": True} diff --git a/tests/golden/templates/packageable_resources/lambda_function_zip/template.yaml b/tests/golden/templates/packageable_resources/lambda_function_zip/template.yaml new file mode 100644 index 00000000000..e91999c61e6 --- /dev/null +++ b/tests/golden/templates/packageable_resources/lambda_function_zip/template.yaml @@ -0,0 +1,12 @@ +AWSTemplateFormatVersion: '2010-09-09' + +Resources: + HelloFunction: + Type: AWS::Lambda::Function + Properties: + FunctionName: hello + Runtime: python3.11 + Handler: app.handler + Role: arn:aws:iam::123456789012:role/lambda-role + Code: ./src/ + MemorySize: 128 diff --git a/tests/golden/templates/sam_resources/serverless_function_zip/README.md b/tests/golden/templates/sam_resources/serverless_function_zip/README.md new file mode 100644 index 00000000000..cd6f4029995 --- /dev/null +++ b/tests/golden/templates/sam_resources/serverless_function_zip/README.md @@ -0,0 +1,6 @@ +# serverless_function_zip + +Sentinel SAM case. Vanilla `AWS::Serverless::Function` with local `CodeUri` +pointing at a ZIP-packaged source dir. Locks the post-build `template.yaml` +shape (artifact path normalized to `<>`) and post-package +shape (CodeUri rewritten to `s3://golden-bucket/`). diff --git a/tests/golden/templates/sam_resources/serverless_function_zip/metadata.yaml b/tests/golden/templates/sam_resources/serverless_function_zip/metadata.yaml new file mode 100644 index 00000000000..7b7f0e74979 --- /dev/null +++ b/tests/golden/templates/sam_resources/serverless_function_zip/metadata.yaml @@ -0,0 +1,3 @@ +language_extensions: false +description: "Sentinel: AWS::Serverless::Function with local CodeUri (ZIP packaging)." +issue_refs: [] diff --git a/tests/golden/templates/sam_resources/serverless_function_zip/src/app.py b/tests/golden/templates/sam_resources/serverless_function_zip/src/app.py new file mode 100644 index 00000000000..29193508dc0 --- /dev/null +++ b/tests/golden/templates/sam_resources/serverless_function_zip/src/app.py @@ -0,0 +1,2 @@ +def handler(event, context): + return {"ok": True} diff --git a/tests/golden/templates/sam_resources/serverless_function_zip/template.yaml b/tests/golden/templates/sam_resources/serverless_function_zip/template.yaml new file mode 100644 index 00000000000..47f228ea945 --- /dev/null +++ b/tests/golden/templates/sam_resources/serverless_function_zip/template.yaml @@ -0,0 +1,11 @@ +AWSTemplateFormatVersion: '2010-09-09' +Transform: AWS::Serverless-2016-10-31 + +Resources: + HelloFunction: + Type: AWS::Serverless::Function + Properties: + Handler: app.handler + Runtime: python3.11 + CodeUri: ./src/ + MemorySize: 128 From fa619622df212819771bf935cc50146ca44b6a07 Mon Sep 17 00:00:00 2001 From: Harold Sun Date: Tue, 2 Jun 2026 02:15:34 +0000 Subject: [PATCH 3/9] =?UTF-8?q?test(golden):=20normalize()=20=E2=80=94=20d?= =?UTF-8?q?eterministic=20YAML=20rendering?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Strips volatile Metadata.SamTransformMetrics, sort_keys=True, single trailing newline. Used by both the harness and the regenerator so they cannot diverge on serialization. --- tests/golden/normalize.py | 49 +++++++++++++++++++ tests/golden/test_normalize.py | 87 ++++++++++++++++++++++++++++++++++ 2 files changed, 136 insertions(+) create mode 100644 tests/golden/normalize.py create mode 100644 tests/golden/test_normalize.py diff --git a/tests/golden/normalize.py b/tests/golden/normalize.py new file mode 100644 index 00000000000..932dd7e9003 --- /dev/null +++ b/tests/golden/normalize.py @@ -0,0 +1,49 @@ +"""Deterministic YAML rendering for golden-template comparison. + +The harness and update_goldens.py both call normalize() so the bytes +written to disk match the bytes the test compares. Determinism rules: + +- Resources / Mappings keys sorted alphabetically. +- Metadata.SamTransformMetrics dropped (varies by run). +- Empty Metadata block dropped. +- yaml.safe_dump with sort_keys=True, default_flow_style=False. +- Single trailing newline. +""" + +from __future__ import annotations + +from typing import Any, Dict + +import yaml + +_VOLATILE_METADATA_KEYS = frozenset({"SamTransformMetrics"}) + + +def _filter_metadata(template: Dict[str, Any]) -> None: + metadata = template.get("Metadata") + if not isinstance(metadata, dict): + return + for key in _VOLATILE_METADATA_KEYS: + metadata.pop(key, None) + if not metadata: + template.pop("Metadata", None) + + +def normalize(template: Dict[str, Any]) -> str: + """Render template to deterministic YAML string.""" + # Mutate a copy so we don't surprise the caller. + template = {k: v for k, v in template.items()} + if isinstance(template.get("Metadata"), dict): + template["Metadata"] = dict(template["Metadata"]) + _filter_metadata(template) + + rendered = yaml.safe_dump( + template, + sort_keys=True, + default_flow_style=False, + width=10**9, # don't wrap long strings + allow_unicode=True, + ) + if not rendered.endswith("\n"): + rendered += "\n" + return rendered diff --git a/tests/golden/test_normalize.py b/tests/golden/test_normalize.py new file mode 100644 index 00000000000..ed4ed64338f --- /dev/null +++ b/tests/golden/test_normalize.py @@ -0,0 +1,87 @@ +"""Unit tests for normalize() — deterministic YAML serialization.""" + +import pytest + +from tests.golden.normalize import normalize + + +def test_normalize_produces_trailing_newline(): + out = normalize({"Resources": {}}) + assert out.endswith("\n") + assert not out.endswith("\n\n") + + +def test_normalize_sorts_resources_keys(): + template = { + "Resources": { + "Zeta": {"Type": "AWS::Lambda::Function"}, + "Alpha": {"Type": "AWS::Lambda::Function"}, + }, + } + out = normalize(template) + assert out.index("Alpha") < out.index("Zeta") + + +def test_normalize_sorts_mappings_keys(): + template = { + "Mappings": { + "Z": {"k": {"v": "1"}}, + "A": {"k": {"v": "1"}}, + }, + } + out = normalize(template) + assert out.index("A:") < out.index("Z:") + + +def test_normalize_drops_sam_transform_metrics(): + template = { + "Metadata": { + "SamTransformMetrics": {"foo": "bar"}, + "Other": "keep", + }, + "Resources": {}, + } + out = normalize(template) + assert "SamTransformMetrics" not in out + assert "Other" in out + + +def test_normalize_drops_metadata_block_if_empty_after_filter(): + template = { + "Metadata": {"SamTransformMetrics": {"foo": "bar"}}, + "Resources": {}, + } + out = normalize(template) + assert "Metadata" not in out + + +def test_normalize_is_idempotent(): + template = {"Resources": {"A": {"Type": "T"}}} + once = normalize(template) + import yaml + twice = normalize(yaml.safe_load(once)) + assert once == twice + + +def test_normalize_preserves_intrinsic_function_dicts(): + template = { + "Resources": { + "F": { + "Type": "AWS::Lambda::Function", + "Properties": { + "Code": {"ZipFile": {"Fn::Sub": "x-${AWS::Region}"}}, + }, + } + } + } + out = normalize(template) + assert "Fn::Sub" in out + assert "x-${AWS::Region}" in out + + +def test_normalize_uses_block_style_not_flow_style(): + template = {"Resources": {"A": {"Type": "T", "Properties": {"k": "v"}}}} + out = normalize(template) + # block style uses indentation, not braces + assert "{" not in out + assert "}" not in out From 6f3555cbe707c96d19f4a18f51f6ed1caae7bb29 Mon Sep 17 00:00:00 2001 From: Harold Sun Date: Tue, 2 Jun 2026 02:24:47 +0000 Subject: [PATCH 4/9] docs(designs): drop paths: filter from semver-gate workflow MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per review on PR #9057: a workflow that uses on.pull_request.paths: to fire only on relevant changes cannot also be a required branch- protection check. When paths don't match, the workflow doesn't run, no status posts, and branch protection treats the required check as missing/pending — blocking the PR. check_semver_bump.py already self-gates (returns exit 0 when no expected.*.yaml files changed), so dropping the paths filter and running unconditionally is the canonical fix for the "skipped but required check" pitfall. The script does the gating; the workflow just runs it. --- designs/golden_templates_and_semver_check.md | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/designs/golden_templates_and_semver_check.md b/designs/golden_templates_and_semver_check.md index 6a287b92ec9..a3ab7aa1f18 100644 --- a/designs/golden_templates_and_semver_check.md +++ b/designs/golden_templates_and_semver_check.md @@ -254,18 +254,25 @@ can never diverge. ### The semver gate Tiny standalone GitHub workflow at -`.github/workflows/golden-semver-gate.yml`. Fires only when corpus pins or -the version file change: +`.github/workflows/golden-semver-gate.yml`. Runs on every PR; the script +itself returns exit 0 when no corpus pins changed, so it's cheap to run +unconditionally and posts a definitive status on every PR (which lets +branch protection mark it as a required check without blocking +unrelated PRs): ```yaml on: pull_request: branches: [develop, "feat/*", "feat-*"] - paths: - - "tests/golden/templates/**/expected.*.yaml" - - "samcli/__init__.py" ``` +The workflow does *not* use `on.pull_request.paths:` filtering. Path +filtering at the trigger level skips the workflow entirely on PRs that +touch no matching path — no status posts, and a required check that +never reports gets treated by branch protection as missing/pending. +Self-gating in the script (return 0 when there are no relevant changes) +is the simpler fix and keeps the check eligible to be required. + `check_semver_bump.py` rules: 1. Diff merge base vs HEAD; collect changes to From 3386a5be22e9741935be3aa4e57a01b79f298edf Mon Sep 17 00:00:00 2001 From: Harold Sun Date: Tue, 2 Jun 2026 02:31:20 +0000 Subject: [PATCH 5/9] revert: drop premature PR 1 commits from this branch MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Reverts both: - fa619622d test(golden): normalize() — deterministic YAML rendering - e0280c2bb test(golden): add sentinel template directories These belong in PR 1 (the harness PR), not PR 0 (the design doc PR). They were authored on this branch in error and pushed to PR #9057 before the mistake was caught. Force-push isn't permitted on protected feat/* branches, so reverting is the cleanest reset. The reverted changes are preserved on branch feat/golden-templates-pr1 (plus uncommitted WIP harness.py/test_harness_build.py from a parallel session) for the upcoming PR 1 to build on. --- tests/golden/__init__.py | 0 tests/golden/normalize.py | 49 ----------- .../foreach_static_zip/README.md | 7 -- .../foreach_static_zip/metadata.yaml | 3 - .../foreach_static_zip/src/app.py | 2 - .../foreach_static_zip/template.yaml | 16 ---- .../lambda_function_zip/README.md | 5 -- .../lambda_function_zip/metadata.yaml | 3 - .../lambda_function_zip/src/app.py | 2 - .../lambda_function_zip/template.yaml | 12 --- .../serverless_function_zip/README.md | 6 -- .../serverless_function_zip/metadata.yaml | 3 - .../serverless_function_zip/src/app.py | 2 - .../serverless_function_zip/template.yaml | 11 --- tests/golden/test_normalize.py | 87 ------------------- 15 files changed, 208 deletions(-) delete mode 100644 tests/golden/__init__.py delete mode 100644 tests/golden/normalize.py delete mode 100644 tests/golden/templates/language_extensions/foreach_static_zip/README.md delete mode 100644 tests/golden/templates/language_extensions/foreach_static_zip/metadata.yaml delete mode 100644 tests/golden/templates/language_extensions/foreach_static_zip/src/app.py delete mode 100644 tests/golden/templates/language_extensions/foreach_static_zip/template.yaml delete mode 100644 tests/golden/templates/packageable_resources/lambda_function_zip/README.md delete mode 100644 tests/golden/templates/packageable_resources/lambda_function_zip/metadata.yaml delete mode 100644 tests/golden/templates/packageable_resources/lambda_function_zip/src/app.py delete mode 100644 tests/golden/templates/packageable_resources/lambda_function_zip/template.yaml delete mode 100644 tests/golden/templates/sam_resources/serverless_function_zip/README.md delete mode 100644 tests/golden/templates/sam_resources/serverless_function_zip/metadata.yaml delete mode 100644 tests/golden/templates/sam_resources/serverless_function_zip/src/app.py delete mode 100644 tests/golden/templates/sam_resources/serverless_function_zip/template.yaml delete mode 100644 tests/golden/test_normalize.py diff --git a/tests/golden/__init__.py b/tests/golden/__init__.py deleted file mode 100644 index e69de29bb2d..00000000000 diff --git a/tests/golden/normalize.py b/tests/golden/normalize.py deleted file mode 100644 index 932dd7e9003..00000000000 --- a/tests/golden/normalize.py +++ /dev/null @@ -1,49 +0,0 @@ -"""Deterministic YAML rendering for golden-template comparison. - -The harness and update_goldens.py both call normalize() so the bytes -written to disk match the bytes the test compares. Determinism rules: - -- Resources / Mappings keys sorted alphabetically. -- Metadata.SamTransformMetrics dropped (varies by run). -- Empty Metadata block dropped. -- yaml.safe_dump with sort_keys=True, default_flow_style=False. -- Single trailing newline. -""" - -from __future__ import annotations - -from typing import Any, Dict - -import yaml - -_VOLATILE_METADATA_KEYS = frozenset({"SamTransformMetrics"}) - - -def _filter_metadata(template: Dict[str, Any]) -> None: - metadata = template.get("Metadata") - if not isinstance(metadata, dict): - return - for key in _VOLATILE_METADATA_KEYS: - metadata.pop(key, None) - if not metadata: - template.pop("Metadata", None) - - -def normalize(template: Dict[str, Any]) -> str: - """Render template to deterministic YAML string.""" - # Mutate a copy so we don't surprise the caller. - template = {k: v for k, v in template.items()} - if isinstance(template.get("Metadata"), dict): - template["Metadata"] = dict(template["Metadata"]) - _filter_metadata(template) - - rendered = yaml.safe_dump( - template, - sort_keys=True, - default_flow_style=False, - width=10**9, # don't wrap long strings - allow_unicode=True, - ) - if not rendered.endswith("\n"): - rendered += "\n" - return rendered diff --git a/tests/golden/templates/language_extensions/foreach_static_zip/README.md b/tests/golden/templates/language_extensions/foreach_static_zip/README.md deleted file mode 100644 index 8c6bec56ccc..00000000000 --- a/tests/golden/templates/language_extensions/foreach_static_zip/README.md +++ /dev/null @@ -1,7 +0,0 @@ -# foreach_static_zip - -Sentinel LE case. `Fn::ForEach` over a static collection with all expanded -functions sharing the same `CodeUri`. Locks the expanded-template shape -(`AlphaFunction`, `BetaFunction` inlined as siblings, ForEach key gone) -and the post-package shape (each function's `CodeUri` rewritten to the -same `s3://golden-bucket/`). diff --git a/tests/golden/templates/language_extensions/foreach_static_zip/metadata.yaml b/tests/golden/templates/language_extensions/foreach_static_zip/metadata.yaml deleted file mode 100644 index e2113d53a40..00000000000 --- a/tests/golden/templates/language_extensions/foreach_static_zip/metadata.yaml +++ /dev/null @@ -1,3 +0,0 @@ -language_extensions: true -description: "Sentinel: Fn::ForEach over static collection, shared CodeUri." -issue_refs: [] diff --git a/tests/golden/templates/language_extensions/foreach_static_zip/src/app.py b/tests/golden/templates/language_extensions/foreach_static_zip/src/app.py deleted file mode 100644 index 29193508dc0..00000000000 --- a/tests/golden/templates/language_extensions/foreach_static_zip/src/app.py +++ /dev/null @@ -1,2 +0,0 @@ -def handler(event, context): - return {"ok": True} diff --git a/tests/golden/templates/language_extensions/foreach_static_zip/template.yaml b/tests/golden/templates/language_extensions/foreach_static_zip/template.yaml deleted file mode 100644 index 9b4969402b8..00000000000 --- a/tests/golden/templates/language_extensions/foreach_static_zip/template.yaml +++ /dev/null @@ -1,16 +0,0 @@ -AWSTemplateFormatVersion: '2010-09-09' -Transform: - - AWS::LanguageExtensions - - AWS::Serverless-2016-10-31 - -Resources: - Fn::ForEach::Workers: - - WorkerName - - - Alpha - - Beta - - ${WorkerName}Function: - Type: AWS::Serverless::Function - Properties: - Handler: app.handler - Runtime: python3.11 - CodeUri: ./src/ diff --git a/tests/golden/templates/packageable_resources/lambda_function_zip/README.md b/tests/golden/templates/packageable_resources/lambda_function_zip/README.md deleted file mode 100644 index c71fa401dd1..00000000000 --- a/tests/golden/templates/packageable_resources/lambda_function_zip/README.md +++ /dev/null @@ -1,5 +0,0 @@ -# lambda_function_zip - -Sentinel non-SAM case. Raw `AWS::Lambda::Function` with `Code: ./src/`. Locks -the post-package output shape after the CFN-resource-list packageable walker -rewrites `Code` to `s3://golden-bucket/`. diff --git a/tests/golden/templates/packageable_resources/lambda_function_zip/metadata.yaml b/tests/golden/templates/packageable_resources/lambda_function_zip/metadata.yaml deleted file mode 100644 index c9bdf896efe..00000000000 --- a/tests/golden/templates/packageable_resources/lambda_function_zip/metadata.yaml +++ /dev/null @@ -1,3 +0,0 @@ -language_extensions: false -description: "Sentinel: raw AWS::Lambda::Function with local Code path." -issue_refs: [] diff --git a/tests/golden/templates/packageable_resources/lambda_function_zip/src/app.py b/tests/golden/templates/packageable_resources/lambda_function_zip/src/app.py deleted file mode 100644 index 29193508dc0..00000000000 --- a/tests/golden/templates/packageable_resources/lambda_function_zip/src/app.py +++ /dev/null @@ -1,2 +0,0 @@ -def handler(event, context): - return {"ok": True} diff --git a/tests/golden/templates/packageable_resources/lambda_function_zip/template.yaml b/tests/golden/templates/packageable_resources/lambda_function_zip/template.yaml deleted file mode 100644 index e91999c61e6..00000000000 --- a/tests/golden/templates/packageable_resources/lambda_function_zip/template.yaml +++ /dev/null @@ -1,12 +0,0 @@ -AWSTemplateFormatVersion: '2010-09-09' - -Resources: - HelloFunction: - Type: AWS::Lambda::Function - Properties: - FunctionName: hello - Runtime: python3.11 - Handler: app.handler - Role: arn:aws:iam::123456789012:role/lambda-role - Code: ./src/ - MemorySize: 128 diff --git a/tests/golden/templates/sam_resources/serverless_function_zip/README.md b/tests/golden/templates/sam_resources/serverless_function_zip/README.md deleted file mode 100644 index cd6f4029995..00000000000 --- a/tests/golden/templates/sam_resources/serverless_function_zip/README.md +++ /dev/null @@ -1,6 +0,0 @@ -# serverless_function_zip - -Sentinel SAM case. Vanilla `AWS::Serverless::Function` with local `CodeUri` -pointing at a ZIP-packaged source dir. Locks the post-build `template.yaml` -shape (artifact path normalized to `<>`) and post-package -shape (CodeUri rewritten to `s3://golden-bucket/`). diff --git a/tests/golden/templates/sam_resources/serverless_function_zip/metadata.yaml b/tests/golden/templates/sam_resources/serverless_function_zip/metadata.yaml deleted file mode 100644 index 7b7f0e74979..00000000000 --- a/tests/golden/templates/sam_resources/serverless_function_zip/metadata.yaml +++ /dev/null @@ -1,3 +0,0 @@ -language_extensions: false -description: "Sentinel: AWS::Serverless::Function with local CodeUri (ZIP packaging)." -issue_refs: [] diff --git a/tests/golden/templates/sam_resources/serverless_function_zip/src/app.py b/tests/golden/templates/sam_resources/serverless_function_zip/src/app.py deleted file mode 100644 index 29193508dc0..00000000000 --- a/tests/golden/templates/sam_resources/serverless_function_zip/src/app.py +++ /dev/null @@ -1,2 +0,0 @@ -def handler(event, context): - return {"ok": True} diff --git a/tests/golden/templates/sam_resources/serverless_function_zip/template.yaml b/tests/golden/templates/sam_resources/serverless_function_zip/template.yaml deleted file mode 100644 index 47f228ea945..00000000000 --- a/tests/golden/templates/sam_resources/serverless_function_zip/template.yaml +++ /dev/null @@ -1,11 +0,0 @@ -AWSTemplateFormatVersion: '2010-09-09' -Transform: AWS::Serverless-2016-10-31 - -Resources: - HelloFunction: - Type: AWS::Serverless::Function - Properties: - Handler: app.handler - Runtime: python3.11 - CodeUri: ./src/ - MemorySize: 128 diff --git a/tests/golden/test_normalize.py b/tests/golden/test_normalize.py deleted file mode 100644 index ed4ed64338f..00000000000 --- a/tests/golden/test_normalize.py +++ /dev/null @@ -1,87 +0,0 @@ -"""Unit tests for normalize() — deterministic YAML serialization.""" - -import pytest - -from tests.golden.normalize import normalize - - -def test_normalize_produces_trailing_newline(): - out = normalize({"Resources": {}}) - assert out.endswith("\n") - assert not out.endswith("\n\n") - - -def test_normalize_sorts_resources_keys(): - template = { - "Resources": { - "Zeta": {"Type": "AWS::Lambda::Function"}, - "Alpha": {"Type": "AWS::Lambda::Function"}, - }, - } - out = normalize(template) - assert out.index("Alpha") < out.index("Zeta") - - -def test_normalize_sorts_mappings_keys(): - template = { - "Mappings": { - "Z": {"k": {"v": "1"}}, - "A": {"k": {"v": "1"}}, - }, - } - out = normalize(template) - assert out.index("A:") < out.index("Z:") - - -def test_normalize_drops_sam_transform_metrics(): - template = { - "Metadata": { - "SamTransformMetrics": {"foo": "bar"}, - "Other": "keep", - }, - "Resources": {}, - } - out = normalize(template) - assert "SamTransformMetrics" not in out - assert "Other" in out - - -def test_normalize_drops_metadata_block_if_empty_after_filter(): - template = { - "Metadata": {"SamTransformMetrics": {"foo": "bar"}}, - "Resources": {}, - } - out = normalize(template) - assert "Metadata" not in out - - -def test_normalize_is_idempotent(): - template = {"Resources": {"A": {"Type": "T"}}} - once = normalize(template) - import yaml - twice = normalize(yaml.safe_load(once)) - assert once == twice - - -def test_normalize_preserves_intrinsic_function_dicts(): - template = { - "Resources": { - "F": { - "Type": "AWS::Lambda::Function", - "Properties": { - "Code": {"ZipFile": {"Fn::Sub": "x-${AWS::Region}"}}, - }, - } - } - } - out = normalize(template) - assert "Fn::Sub" in out - assert "x-${AWS::Region}" in out - - -def test_normalize_uses_block_style_not_flow_style(): - template = {"Resources": {"A": {"Type": "T", "Properties": {"k": "v"}}}} - out = normalize(template) - # block style uses indentation, not braces - assert "{" not in out - assert "}" not in out From d4a649aba3607982b15b425067afc6eed9a08a1b Mon Sep 17 00:00:00 2001 From: Harold Sun Date: Tue, 2 Jun 2026 15:54:50 +0000 Subject: [PATCH 6/9] docs(designs): place subprocess runner under tests/regression MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Move PR 2's subprocess-driven tier from tests/integration/golden_templates/ to tests/regression/golden_templates/. Aligns with the existing tests/regression/deploy/ and tests/regression/package/ suites — same shape (real `sam` subprocess, real AWS credentials, daily-only). Bonus: integration-tests.yml's existing other-and-e2e matrix entry already collects tests/regression/, so no new matrix entry or workflow edit is needed for the regression tier to start running. Renamed "integration tier" → "regression tier" throughout for clarity. --- designs/golden_templates_and_semver_check.md | 54 +++++++++++++------- 1 file changed, 36 insertions(+), 18 deletions(-) diff --git a/designs/golden_templates_and_semver_check.md b/designs/golden_templates_and_semver_check.md index a3ab7aa1f18..5fc82a94c6b 100644 --- a/designs/golden_templates_and_semver_check.md +++ b/designs/golden_templates_and_semver_check.md @@ -28,11 +28,15 @@ requires a major-version bump whenever a pin changes. What will be changed? --------------------- -Add a new `tests/golden/` corpus and harness, a new daily integration tier -under `tests/integration/golden_templates/`, and a tiny standalone GitHub -workflow that enforces the semver rule. The corpus covers both LE and -non-LE templates across four axes: SAM resources, packageable CFN resources, -LE intrinsics, and cross-cutting features. +Add a new `tests/golden/` corpus and harness (per-PR unit tier), a daily +subprocess-driven tier under `tests/regression/golden_templates/`, and a +tiny standalone GitHub workflow that enforces the semver rule. The corpus +covers both LE and non-LE templates across four axes: SAM resources, +packageable CFN resources, LE intrinsics, and cross-cutting features. + +The two tiers share `tests/golden/templates/` as their single corpus +source — the regression tier reads case directories straight from the +unit tier so authoring a new case lands in both places at once. Success criteria for the change ------------------------------- @@ -53,7 +57,7 @@ Success criteria for the change Out-of-Scope ------------ -- No deploy step. The integration tier stops at packaged-template output +- No deploy step. The regression tier stops at packaged-template output — no CloudFormation, no real Lambdas. - No CFN compatibility check (e.g. `cfn-lint`, `ValidateTemplate`) at this stage. The corpus pins what *SAM-CLI* produces; CFN-side parity is a @@ -139,14 +143,19 @@ tests/golden/ # NEW ├── language_extensions//... # LE axis └── cross_cutting//... # nested stacks, AWS::Include, etc. -tests/integration/golden_templates/ # NEW +tests/regression/golden_templates/ # NEW ├── test_golden_subprocess.py # imports tests/golden/templates as data └── structural_compare.py # looser comparator for real-CLI output ``` The `tests/golden/templates/` directory is the single source of truth for -the corpus. Both the unit tier (`tests/golden/`) and the integration tier -(`tests/integration/golden_templates/`) read from it; nothing is duplicated. +the corpus. Both the unit tier (`tests/golden/`) and the regression tier +(`tests/regression/golden_templates/`) read from it; nothing is duplicated. + +The regression tier sits alongside the existing `tests/regression/deploy/` +and `tests/regression/package/` suites — they're structurally similar: +real `sam` subprocess invocation, real AWS credentials via the +integration-tests workflow's OIDC role, daily-only execution. ### Per-case directory layout @@ -197,7 +206,7 @@ no `boto3` stubbing. - Single trailing newline. The harness invokes the same high-level functions the CLI invokes; it does -not re-implement the pipeline. The integration tier verifies that contract +not re-implement the pipeline. The regression tier verifies that contract holds. ### Pytest collection @@ -311,10 +320,17 @@ estimated <30s for ~50 cases. The existing Windows `TEMP=D:\Temp` shim already handles temp-dir contention. No new workflow file (the semver gate above is its own workflow, but the corpus tests ride on `build.yml`). -### Integration tier +### Regression tier + +Daily run on the existing `integration-tests.yml` workflow. The +workflow's `other-and-e2e` matrix entry already collects +`tests/regression/`, so adding `tests/regression/golden_templates/` is +picked up automatically — **no new matrix entry, no workflow edit**. The +suite sits alongside the existing `tests/regression/deploy/` and +`tests/regression/package/` directories, which have the same shape (real +`sam` subprocess + real AWS credentials, daily-only). -Daily run on the existing `integration-tests.yml` matrix (one new entry: -`golden-templates`). Per case: +Per case: 1. Copy case dir to a temp working dir; copy `src/`. 2. `sam build [--language-extensions]` subprocess; assert exit 0; read @@ -331,7 +347,7 @@ Daily run on the existing `integration-tests.yml` matrix (one new entry: dict-vs-string differences (the #9029 bug class). The comparator lives in -`tests/integration/golden_templates/structural_compare.py` and is reused +`tests/regression/golden_templates/structural_compare.py` and is reused for both build and package outputs. #### Why structural, not byte-exact @@ -343,7 +359,7 @@ output) or constant churn from environmental noise. Structural comparison answers "did the user-visible output shape change?" — which is what catches the four bugs we just fixed. -If the unit tier passes but the integration tier fails for a case, the +If the unit tier passes but the regression tier fails for a case, the in-process harness has drifted from the CLI surface — investigation, not a re-pin. @@ -353,7 +369,9 @@ Each PR is independently reviewable; LOC estimates are upper-bound net diff. - **PR 0** (this design doc). - **PR 1** — Harness skeleton, three sentinel cases, semver gate workflow. -- **PR 2** — Integration runner. +- **PR 2** — Regression-tier subprocess runner under + `tests/regression/golden_templates/`. Picked up by the existing + `other-and-e2e` matrix entry — no workflow edit required. - **PR 3** — SAM resources axis: Function variants (~10 cases). - **PR 4** — SAM resources axis: Api / HttpApi / StateMachine / others (~10 cases). @@ -386,7 +404,7 @@ Security Considerations ----------------------- The harness intentionally invokes only in-process SAM-CLI code; no -network calls, no credentials, no AWS API access. The integration tier +network calls, no credentials, no AWS API access. The regression tier uses the existing integration-tests workflow's IAM role and S3 bucket; no new permissions required. @@ -405,7 +423,7 @@ Subprocess-only harness (no in-process) Every golden runs the real `sam` CLI as a subprocess. Rejected: process startup × N templates is too slow for a per-PR gate; flakier on Windows; deterministic package output without `moto` is hard. Subprocess invocation -is kept, but only at the daily integration tier where speed and +is kept, but only at the daily regression tier where speed and determinism matter less. Snapshot plugin (`syrupy`, `pytest-snapshot`) From 656f93c2589379e4f3ff234426708f8d7ef6e614 Mon Sep 17 00:00:00 2001 From: Harold Sun Date: Tue, 2 Jun 2026 16:29:44 +0000 Subject: [PATCH 7/9] docs(designs): revert PR 2 location to tests/integration/golden_templates MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous commit moved the subprocess runner under tests/regression/, but tests/regression/ is reserved for sam-vs-aws-cli parity tests (deploy_regression compares sam deploy output to aws cloudformation deploy output; package_regression does the same for package). Goldens assert the opposite: sam CLI output matches its own pinned past output, not aws-cli's current output. Putting them under tests/regression would conflate two different test semantics and invite future contributors to misuse the suite. Move back to tests/integration/golden_templates/ alongside the existing tests/integration/buildcmd/, tests/integration/package/, and tests/integration/deploy/ — all subprocess-driven SAM CLI behavior tests, which is exactly what the integration tier is. Adds back the requirement for a new \`golden-templates\` entry in integration-tests.yml's matrix (one extra line); explicitly documents why tests/regression/ would be the wrong home so the choice stands up to future review. --- designs/golden_templates_and_semver_check.md | 64 +++++++++++--------- 1 file changed, 37 insertions(+), 27 deletions(-) diff --git a/designs/golden_templates_and_semver_check.md b/designs/golden_templates_and_semver_check.md index 5fc82a94c6b..d5c8f74c586 100644 --- a/designs/golden_templates_and_semver_check.md +++ b/designs/golden_templates_and_semver_check.md @@ -29,13 +29,13 @@ What will be changed? --------------------- Add a new `tests/golden/` corpus and harness (per-PR unit tier), a daily -subprocess-driven tier under `tests/regression/golden_templates/`, and a +subprocess-driven tier under `tests/integration/golden_templates/`, and a tiny standalone GitHub workflow that enforces the semver rule. The corpus covers both LE and non-LE templates across four axes: SAM resources, packageable CFN resources, LE intrinsics, and cross-cutting features. The two tiers share `tests/golden/templates/` as their single corpus -source — the regression tier reads case directories straight from the +source — the integration tier reads case directories straight from the unit tier so authoring a new case lands in both places at once. Success criteria for the change @@ -57,7 +57,7 @@ Success criteria for the change Out-of-Scope ------------ -- No deploy step. The regression tier stops at packaged-template output +- No deploy step. The integration tier stops at packaged-template output — no CloudFormation, no real Lambdas. - No CFN compatibility check (e.g. `cfn-lint`, `ValidateTemplate`) at this stage. The corpus pins what *SAM-CLI* produces; CFN-side parity is a @@ -143,19 +143,23 @@ tests/golden/ # NEW ├── language_extensions//... # LE axis └── cross_cutting//... # nested stacks, AWS::Include, etc. -tests/regression/golden_templates/ # NEW +tests/integration/golden_templates/ # NEW ├── test_golden_subprocess.py # imports tests/golden/templates as data └── structural_compare.py # looser comparator for real-CLI output ``` The `tests/golden/templates/` directory is the single source of truth for -the corpus. Both the unit tier (`tests/golden/`) and the regression tier -(`tests/regression/golden_templates/`) read from it; nothing is duplicated. - -The regression tier sits alongside the existing `tests/regression/deploy/` -and `tests/regression/package/` suites — they're structurally similar: -real `sam` subprocess invocation, real AWS credentials via the -integration-tests workflow's OIDC role, daily-only execution. +the corpus. Both the unit tier (`tests/golden/`) and the integration tier +(`tests/integration/golden_templates/`) read from it; nothing is duplicated. + +The integration tier sits alongside the existing +`tests/integration/buildcmd/`, `tests/integration/package/`, and +`tests/integration/deploy/` suites. ``tests/regression/`` is reserved for +``sam`` vs ``aws cloudformation`` parity tests — a different semantic +that compares SAM CLI output against the AWS CLI's current output — +which is not what the golden corpus asserts. Goldens compare SAM CLI +against its own pinned past behavior, so they belong under +``tests/integration/``. ### Per-case directory layout @@ -206,7 +210,7 @@ no `boto3` stubbing. - Single trailing newline. The harness invokes the same high-level functions the CLI invokes; it does -not re-implement the pipeline. The regression tier verifies that contract +not re-implement the pipeline. The integration tier verifies that contract holds. ### Pytest collection @@ -320,15 +324,21 @@ estimated <30s for ~50 cases. The existing Windows `TEMP=D:\Temp` shim already handles temp-dir contention. No new workflow file (the semver gate above is its own workflow, but the corpus tests ride on `build.yml`). -### Regression tier +### Integration tier + +Daily run on the existing `integration-tests.yml` workflow's matrix +(one new entry: `golden-templates`). The suite sits alongside the +existing `tests/integration/buildcmd/`, `tests/integration/package/`, +and `tests/integration/deploy/` directories — all subprocess-driven +``sam`` invocations against real AWS resources. -Daily run on the existing `integration-tests.yml` workflow. The -workflow's `other-and-e2e` matrix entry already collects -`tests/regression/`, so adding `tests/regression/golden_templates/` is -picked up automatically — **no new matrix entry, no workflow edit**. The -suite sits alongside the existing `tests/regression/deploy/` and -`tests/regression/package/` directories, which have the same shape (real -`sam` subprocess + real AWS credentials, daily-only). +We deliberately do **not** put this under ``tests/regression/``. The +existing ``tests/regression/`` suites are ``sam`` vs +``aws cloudformation`` parity tests — they assert SAM CLI's output +matches AWS CLI's *current* output. Goldens assert the opposite: SAM +CLI's output matches its own *pinned past* output. Different semantics; +sharing a directory would invite future contributors to confuse the +two. Per case: @@ -347,7 +357,7 @@ Per case: dict-vs-string differences (the #9029 bug class). The comparator lives in -`tests/regression/golden_templates/structural_compare.py` and is reused +`tests/integration/golden_templates/structural_compare.py` and is reused for both build and package outputs. #### Why structural, not byte-exact @@ -359,7 +369,7 @@ output) or constant churn from environmental noise. Structural comparison answers "did the user-visible output shape change?" — which is what catches the four bugs we just fixed. -If the unit tier passes but the regression tier fails for a case, the +If the unit tier passes but the integration tier fails for a case, the in-process harness has drifted from the CLI surface — investigation, not a re-pin. @@ -369,9 +379,9 @@ Each PR is independently reviewable; LOC estimates are upper-bound net diff. - **PR 0** (this design doc). - **PR 1** — Harness skeleton, three sentinel cases, semver gate workflow. -- **PR 2** — Regression-tier subprocess runner under - `tests/regression/golden_templates/`. Picked up by the existing - `other-and-e2e` matrix entry — no workflow edit required. +- **PR 2** — Integration-tier subprocess runner under + `tests/integration/golden_templates/`. Adds a new `golden-templates` + entry to the existing `integration-tests.yml` matrix. - **PR 3** — SAM resources axis: Function variants (~10 cases). - **PR 4** — SAM resources axis: Api / HttpApi / StateMachine / others (~10 cases). @@ -404,7 +414,7 @@ Security Considerations ----------------------- The harness intentionally invokes only in-process SAM-CLI code; no -network calls, no credentials, no AWS API access. The regression tier +network calls, no credentials, no AWS API access. The integration tier uses the existing integration-tests workflow's IAM role and S3 bucket; no new permissions required. @@ -423,7 +433,7 @@ Subprocess-only harness (no in-process) Every golden runs the real `sam` CLI as a subprocess. Rejected: process startup × N templates is too slow for a per-PR gate; flakier on Windows; deterministic package output without `moto` is hard. Subprocess invocation -is kept, but only at the daily regression tier where speed and +is kept, but only at the daily integration tier where speed and determinism matter less. Snapshot plugin (`syrupy`, `pytest-snapshot`) From fd5e38c2ecf3fcaf3a95d7db2a0040e7fb08a244 Mon Sep 17 00:00:00 2001 From: Harold Sun Date: Tue, 2 Jun 2026 17:22:10 +0000 Subject: [PATCH 8/9] docs(designs): reframe motivation around review-surface for output shape MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous motivation leaned on a retroactive "we'd have caught the four bugs" claim. That claim is partially false: pre-#8637, SAM CLI didn't process AWS::LanguageExtensions locally, so a pre-existing LE corpus pinning behavior nobody had committed to would not have been written in the first place. The right moment to add the corpus is the feature PR itself, not before. Replace the motivation with the actual gap: existing tests verify internals (unit) and narrow output assertions (integration), but neither puts the rendered template in front of a human reviewer at PR time. Subtle interactions between pipeline stages — LE expansion vs. SAM transform vs. global-transform export vs. artifact-path merge — are invisible at review time. #8637 is exhibit A: the four post-release bugs are each a YAML-shape diff a reviewer would have caught on sight, if the diff had been visible. Also reframe the semver pitch around the actual customer impact: the most impactful behavior changes don't merely surprise tooling, they break customer CI/CD pipelines outright. #9004 is the canonical example — sam build raised InvalidTemplateException on any template using !FindInMap over a parameter without Default, blocking every pipeline that consumed it. The 1.159 → 1.160 bump was minor per semver, signalling "additive, backwards-compatible" — but local LE processing turning on by default broke working pipelines, which is the definition of a backwards-incompatible change. The version number didn't tell consumers to audit their pipelines, so they didn't. --- designs/golden_templates_and_semver_check.md | 107 +++++++++++++++---- 1 file changed, 87 insertions(+), 20 deletions(-) diff --git a/designs/golden_templates_and_semver_check.md b/designs/golden_templates_and_semver_check.md index d5c8f74c586..630d85547f9 100644 --- a/designs/golden_templates_and_semver_check.md +++ b/designs/golden_templates_and_semver_check.md @@ -4,26 +4,93 @@ Title: Golden Templates and Semver Check What is the problem? -------------------- -PR #8637 (sam-cli 1.160.0) shipped in-tree CloudFormation Language Extensions -(LE) support. Within weeks, four production regressions surfaced — every one -a behavior shift in the *expanded-template output* that no automated test -caught before release: - -- #9004 — `Fn::FindInMap` / `Join` / `Select` / `Base64` raised on unresolved - parameter `Ref`s in PARTIAL mode (fixed in #9010). -- #9005 — `PACKAGEABLE_RESOURCE_ARTIFACT_PROPERTIES` had drifted from canonical - resource lists; `AWS::Serverless::Application` and seven sibling types - failed packaging (fixed in #9009). -- #9027 — `AWS::Include` buried inside `Fn::ToJsonString` was invisible to the - global-transform walker because LE expansion ran first (fixed in #9030). -- #9029 — LE-aware build merge overwrote `Code: { ZipFile: !Sub ... }` with - the LE-resolved value, baking pseudo-param defaults into the built template - (fixed in #9031). - -PR #9033 made local LE processing opt-in to give customers a clean migration -path. The action item from that retrospective: a golden-template corpus that -pins the expanded output of representative templates, plus a CI gate that -requires a major-version bump whenever a pin changes. +SAM CLI processes user templates and writes the result to disk for +CloudFormation to consume. The output template is the contract with users +and their tooling — yet the existing test suite has no review surface for +that output's *shape*. Unit tests verify individual functions in +isolation; integration tests assert exit codes and a few narrow +properties (`Resources.X.Type == ...`); neither puts the rendered +template in front of a human reviewer. + +This shows up most painfully in feature PRs that introduce new +output-shaping behavior. The reviewer can read the code and run the +tests, but cannot easily see *what the feature produces for representative +inputs* — the diff in the PR is `samcli/lib/...` Python code, not the +YAML that customers will deploy. Subtle interactions between pipeline +stages — LE expansion vs. SAM transform vs. global-transform export +vs. artifact-path merge — are invisible at review time. + +PR #8637 (sam-cli 1.160.0, in-tree CFN Language Extensions support) is +exhibit A. It introduced large new output-shaping behavior across the +build and package paths. Within weeks of release four bugs surfaced, +each a behavior shift visible in the post-build or post-package +template that no automated test caught before release: + +- #9004 — `Fn::FindInMap` / `Join` / `Select` / `Base64` raised on + unresolved parameter `Ref`s in PARTIAL mode (fixed in #9010). +- #9005 — `PACKAGEABLE_RESOURCE_ARTIFACT_PROPERTIES` had drifted from + canonical resource lists; `AWS::Serverless::Application` and seven + sibling types failed packaging (fixed in #9009). +- #9027 — `AWS::Include` buried inside `Fn::ToJsonString` was invisible + to the global-transform walker because LE expansion ran first (fixed + in #9030). +- #9029 — LE-aware build merge overwrote `Code: { ZipFile: !Sub ... }` + with the LE-resolved value, baking pseudo-param defaults into the + built template (fixed in #9031). + +PR #9033 made local LE processing opt-in to give customers a clean +migration path. + +These bugs didn't slip past #8637 because the test suite was small — +the LE module shipped with comprehensive unit tests and integration +tests. They slipped past because nobody was asked, at PR-review time, +to look at the actual templates `sam build` and `sam package` produce +for templates exercising each new LE intrinsic. Each bug is a YAML-shape +diff that a reviewer would have caught on sight — if the diff had been +visible. + +The right moment for that review surface is the feature PR itself. +#8637 should have shipped with one pinned input/output pair per LE +intrinsic plus the cross-cutting interactions (LE × nested stacks, LE +× AWS::Include, LE × Globals). Each iteration on the PR would have +produced output diffs alongside code diffs; the four bugs would have +appeared as visible shape changes during development, before the PR +merged. That same pattern then carries forward: every future feature +PR pins what it produces, every patch release is auditable for +behavior changes. + +A pinned-output corpus solves only half the problem, though. Behavior +changes — intentional or not — must also be visible to *consumers* of +SAM CLI, not just to reviewers. The most impactful changes don't just +alter output shape: they cause `sam build` or `sam package` to fail +outright, breaking customer CI/CD pipelines on what was assumed to be +a routine upgrade. #9004 is the canonical example — `sam build` +started raising `InvalidTemplateException` on any template that used +`!FindInMap` over a parameter without a `Default`, blocking every CI +that consumed such a template until the customer either pinned to +1.159.x or upgraded past the fix. The 1.159 → 1.160 bump was minor +per semver, signalling "additive, backwards-compatible" — but local +LE processing turning on by default broke working pipelines, which +is the definition of a backwards-incompatible change. The version +number didn't tell consumers to audit their pipelines, so they +didn't. + +This design proposes: + +1. **A golden-template corpus** under `tests/golden/templates/`. + For each representative input, pin the post-build and post-package + YAML byte-for-byte. The pins live in the repo; PR diffs include + both the code change and the resulting output change. Reviewers see + what users will see. + +2. **A CI semver gate** that requires a major-version bump in + `samcli/__init__.py` whenever an existing pinned output is modified + or deleted. New pins (corpus growth) are ungated. The gate makes + user-visible behavior changes explicit at version-number granularity + so downstream tooling can react deliberately. + +The two compose: goldens make output changes *visible* to reviewers; +the semver check makes them *commitments* to consumers. What will be changed? --------------------- From 7ac37e0dfda97c8c747fff0302cc3c242e2a9c20 Mon Sep 17 00:00:00 2001 From: Harold Sun Date: Tue, 2 Jun 2026 17:42:44 +0000 Subject: [PATCH 9/9] docs(designs): add merge_group trigger to semver-gate workflow MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per review on PR #9057: the workflow as proposed only declared on.pull_request, but this repo uses GitHub's merge queue (see build.yml's existing merge_group trigger). When a PR enters the queue, GitHub dispatches a merge_group event against an ephemeral ref, not pull_request. A required check that only listens on pull_request never reports a status on the merge_group event, so the merge queue treats it as missing/pending and blocks the merge — the same skipped-but-required failure mode the path-filter pitfall was about, just under a different trigger. Mirror build.yml: listen on both pull_request and merge_group with the same branch filters. The check_semver_bump.py script's self-gating (exit 0 when no expected.*.yaml changed) only helps when the workflow runs at all; this fix makes it run. Add a "resolve base ref" step because the two event types expose the target branch differently: pull_request sets github.base_ref to the bare branch name; merge_group leaves base_ref empty and exposes the full ref at github.event.merge_group.base_ref (e.g. "refs/heads/develop"), which we strip to match the pull_request shape before passing to check_semver_bump.py. Document both flavors of the pitfall (path-filter and single-trigger) in the design's "skipped but required" callout so future maintainers see why we chose this trigger configuration. --- designs/golden_templates_and_semver_check.md | 42 +++++++++++++++----- 1 file changed, 31 insertions(+), 11 deletions(-) diff --git a/designs/golden_templates_and_semver_check.md b/designs/golden_templates_and_semver_check.md index 630d85547f9..2d9cbc3d920 100644 --- a/designs/golden_templates_and_semver_check.md +++ b/designs/golden_templates_and_semver_check.md @@ -334,24 +334,44 @@ can never diverge. ### The semver gate Tiny standalone GitHub workflow at -`.github/workflows/golden-semver-gate.yml`. Runs on every PR; the script -itself returns exit 0 when no corpus pins changed, so it's cheap to run -unconditionally and posts a definitive status on every PR (which lets -branch protection mark it as a required check without blocking -unrelated PRs): +`.github/workflows/golden-semver-gate.yml`. Runs on every PR *and* on +every merge_group event; the script itself returns exit 0 when no +corpus pins changed, so it's cheap to run unconditionally and posts a +definitive status on every event (which lets branch protection mark +it as a required check without blocking unrelated PRs or merge-queue +runs): ```yaml on: pull_request: branches: [develop, "feat/*", "feat-*"] + merge_group: + types: [checks_requested] + branches: [develop, "feat/*", "feat-*"] ``` -The workflow does *not* use `on.pull_request.paths:` filtering. Path -filtering at the trigger level skips the workflow entirely on PRs that -touch no matching path — no status posts, and a required check that -never reports gets treated by branch protection as missing/pending. -Self-gating in the script (return 0 when there are no relevant changes) -is the simpler fix and keeps the check eligible to be required. +The workflow avoids two flavors of the "skipped but required check" +pitfall: + +1. **No `on.pull_request.paths:` filtering.** Path filtering at the + trigger level skips the workflow entirely on PRs that touch no + matching path — no status posts, and a required check that never + reports gets treated by branch protection as missing/pending. + Self-gating in the script (return 0 when there are no relevant + changes) is the simpler fix and keeps the check eligible to be + required. + +2. **Both `pull_request` and `merge_group` triggers.** The repo uses + GitHub's merge queue; when a PR enters the queue, GitHub + dispatches a `merge_group` event against an ephemeral ref, not + `pull_request`. A required check that only listens on + `pull_request` never reports a status on the `merge_group` event, + so the merge queue treats it as missing/pending — the same failure + mode under a different trigger. Mirror `build.yml`'s pattern by + listening on both events with the same branch filters. The + workflow then resolves the target branch from `github.base_ref` on + PR events and from `github.event.merge_group.base_ref` (stripping + the `refs/heads/` prefix) on merge-queue events. `check_semver_bump.py` rules: