Scenario contract enforcement: build-time guards + single-sourced CM name

## Problem

The seitask workstream has surfaced a recurring failure pattern: contract drift between the seitask binary's internal helpers (\`WorkflowVarsName\`, scheme registration, downward-API env contract) and the scenario YAML / RBAC layer that has to mirror them manually. Each of the last four PRs (sei-protocol/sei-k8s-controller#334, #337, #339, plus the in-flight #339 build-time tests) addressed a different facet of the same shape: an internal helper has a convention, the scenario author has to mirror it manually in YAML, no test catches the drift, the bug surfaces only at first cluster fire ~10 minutes into the run.

Platform-engineer (cross-review on #339):
> \"The scenario YAML is the integration contract between three things (the runtime binary, the chaos-mesh CR shape, the wrapper's envsubst inputs) and none of them validate it. Each bug surfaced at first cluster fire.\"

## Impact

- **Slow feedback loop.** Each contract bug costs ~10–30 min of manual-fire + investigation + fix-PR + image rebuild + SCENARIO_REF bump + re-fire. We've done this loop four times in the last hour to get the harness past keygen.
- **Compounds with scenario count.** Adding a second/third scenario will repeat the contract surface from scratch. Without enforcement, each new scenario brings its own #337-class bugs.
- **Build-time enforcement is cheap.** Two narrow tests added in sei-protocol/sei-k8s-controller#339 already catch the two highest-frequency classes (scheme registration + CM-name drift) at \`go test\`. There's more we could enforce; this issue tracks the broader pattern.

## Proposed approach

Three reinforcing layers, deferred-ranked by effort:

### Layer 1 (already partially in #339): unit tests for internal contracts
- ✅ Scheme round-trip test for every typed CR provision-snd / keygen / upload-report constructs
- ✅ CM-name validation for scenario YAMLs that opt in
- ⏳ RBAC vs kubebuilder-marker reconciliation — defer; un-defer when we hit a third RBAC-class bug

### Layer 2: single-source the CM name across YAML + binary
Two candidate shapes (both reviewers raised independently):

**(a)** Wrapper exports \`SEI_WORKFLOW_VARS_CM=workflow-vars-\${WORKFLOW_NAME}\` env var; scenarios reference \`\$SEI_WORKFLOW_VARS_CM\` via envsubst allow-list. Single string-builder lives in the wrapper bash. No new templating dependency.

**(b)** Render-time template helper \`{{ workflowVarsCM }}\` exposed in a scenario template engine. Aligns with how the runner subcommand already templates SeiNodeTask CRs and how provision-snd templates SND specs. Requires a scenario rendering engine the wrapper invokes (vs current envsubst).

Platform-engineer recommends (a) as the MVP; kubernetes-specialist recommends (b) longer-term. Both eliminate the manual-mirror failure mode.

### Layer 3 (longer-term): \`seitask scenario validate\` subcommand
A schema-validator subcommand that:
- Parses scenario YAML
- Checks every \`configMapRef.name\` matches \`WorkflowVarsName(metadataName)\`
- Checks every \`--var=KEY=...\` flag matches a documented input on the target subcommand
- Checks every \`\$(VAR)\` reference has a producer step earlier in the Serial
- Run pre-commit (Husky/lefthook), in CI, and pre-apply in the wrapper

Catches more bugs than Layer 1 unit tests because it has access to the full scenario semantics (DAG ordering, var producer/consumer matching), not just the YAML structure.

Source: platform-engineer cross-review on sei-protocol/sei-k8s-controller#339.

## Relevant experts

- **platform-engineer** — owns the wrapper bash + envsubst contract; (a) lives entirely in their territory
- **kubernetes-specialist** — owns the operator-pattern alignment for (b); also the rbac-marker reconciliation in Layer 1
- **product-engineer** — should weigh in on which Layer 2 shape fits the longer-term scenario authoring DX

## Acceptance criteria

This issue resolves when:
- [x] Layer 1 partially done (scheme + CM-name tests in #339)
- [ ] Layer 1 RBAC reconciliation test added (when justified)
- [ ] Layer 2: pick (a) or (b) and migrate release-test + future scenarios to it
- [ ] Layer 3: \`seitask scenario validate\` subcommand exists, runs in sei-k8s-controller CI on every PR that touches \`scenarios/\`, runs pre-apply in the wrapper

## Out of scope

- Workflow engine swap (sei-protocol/sei-k8s-controller#332 tracks longer-term Argo evaluation)
- Status-check shared template library (sei-protocol/sei-k8s-controller#330)
- Chaos-mesh fail-fast (sei-protocol/sei-k8s-controller#340)

## References

- sei-protocol/sei-k8s-controller#334 — first contract bug class (downward-API UID assumption)
- sei-protocol/sei-k8s-controller#337 — second (CM name mismatch)
- sei-protocol/sei-k8s-controller#339 — third (scheme + RBAC) + Layer 1 partial implementation
- Memory: \`feedback_prototype_first.md\` survey-checkpoint pattern (this issue's recurrence count, 4, justifies hardening)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scenario contract enforcement: build-time guards + single-sourced CM name #341

Problem

Impact

Proposed approach

Layer 1 (already partially in #339): unit tests for internal contracts

Layer 2: single-source the CM name across YAML + binary

Layer 3 (longer-term): `seitask scenario validate` subcommand

Relevant experts

Acceptance criteria

Out of scope

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Scenario contract enforcement: build-time guards + single-sourced CM name #341

Description

Problem

Impact

Proposed approach

Layer 1 (already partially in #339): unit tests for internal contracts

Layer 2: single-source the CM name across YAML + binary

Layer 3 (longer-term): `seitask scenario validate` subcommand

Relevant experts

Acceptance criteria

Out of scope

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions