You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The seitask workstream has surfaced a recurring failure pattern: contract drift between the seitask binary's internal helpers (`WorkflowVarsName`, scheme registration, downward-API env contract) and the scenario YAML / RBAC layer that has to mirror them manually. Each of the last four PRs (#334, #337, #339, plus the in-flight #339 build-time tests) addressed a different facet of the same shape: an internal helper has a convention, the scenario author has to mirror it manually in YAML, no test catches the drift, the bug surfaces only at first cluster fire ~10 minutes into the run.
"The scenario YAML is the integration contract between three things (the runtime binary, the chaos-mesh CR shape, the wrapper's envsubst inputs) and none of them validate it. Each bug surfaced at first cluster fire."
Impact
Slow feedback loop. Each contract bug costs ~10–30 min of manual-fire + investigation + fix-PR + image rebuild + SCENARIO_REF bump + re-fire. We've done this loop four times in the last hour to get the harness past keygen.
Build-time enforcement is cheap. Two narrow tests added in fix(seitask): register sei.io scheme + grant workflownodes RBAC #339 already catch the two highest-frequency classes (scheme registration + CM-name drift) at `go test`. There's more we could enforce; this issue tracks the broader pattern.
Proposed approach
Three reinforcing layers, deferred-ranked by effort:
Layer 1 (already partially in #339): unit tests for internal contracts
✅ Scheme round-trip test for every typed CR provision-snd / keygen / upload-report constructs
✅ CM-name validation for scenario YAMLs that opt in
⏳ RBAC vs kubebuilder-marker reconciliation — defer; un-defer when we hit a third RBAC-class bug
Layer 2: single-source the CM name across YAML + binary
Two candidate shapes (both reviewers raised independently):
(a) Wrapper exports `SEI_WORKFLOW_VARS_CM=workflow-vars-${WORKFLOW_NAME}` env var; scenarios reference `$SEI_WORKFLOW_VARS_CM` via envsubst allow-list. Single string-builder lives in the wrapper bash. No new templating dependency.
(b) Render-time template helper `{{ workflowVarsCM }}` exposed in a scenario template engine. Aligns with how the runner subcommand already templates SeiNodeTask CRs and how provision-snd templates SND specs. Requires a scenario rendering engine the wrapper invokes (vs current envsubst).
Platform-engineer recommends (a) as the MVP; kubernetes-specialist recommends (b) longer-term. Both eliminate the manual-mirror failure mode.
Checks every `configMapRef.name` matches `WorkflowVarsName(metadataName)`
Checks every `--var=KEY=...` flag matches a documented input on the target subcommand
Checks every `$(VAR)` reference has a producer step earlier in the Serial
Run pre-commit (Husky/lefthook), in CI, and pre-apply in the wrapper
Catches more bugs than Layer 1 unit tests because it has access to the full scenario semantics (DAG ordering, var producer/consumer matching), not just the YAML structure.
Layer 1 RBAC reconciliation test added (when justified)
Layer 2: pick (a) or (b) and migrate release-test + future scenarios to it
Layer 3: `seitask scenario validate` subcommand exists, runs in sei-k8s-controller CI on every PR that touches `scenarios/`, runs pre-apply in the wrapper
Problem
The seitask workstream has surfaced a recurring failure pattern: contract drift between the seitask binary's internal helpers (`WorkflowVarsName`, scheme registration, downward-API env contract) and the scenario YAML / RBAC layer that has to mirror them manually. Each of the last four PRs (#334, #337, #339, plus the in-flight #339 build-time tests) addressed a different facet of the same shape: an internal helper has a convention, the scenario author has to mirror it manually in YAML, no test catches the drift, the bug surfaces only at first cluster fire ~10 minutes into the run.
Platform-engineer (cross-review on #339):
Impact
Proposed approach
Three reinforcing layers, deferred-ranked by effort:
Layer 1 (already partially in #339): unit tests for internal contracts
Layer 2: single-source the CM name across YAML + binary
Two candidate shapes (both reviewers raised independently):
(a) Wrapper exports `SEI_WORKFLOW_VARS_CM=workflow-vars-${WORKFLOW_NAME}` env var; scenarios reference `$SEI_WORKFLOW_VARS_CM` via envsubst allow-list. Single string-builder lives in the wrapper bash. No new templating dependency.
(b) Render-time template helper `{{ workflowVarsCM }}` exposed in a scenario template engine. Aligns with how the runner subcommand already templates SeiNodeTask CRs and how provision-snd templates SND specs. Requires a scenario rendering engine the wrapper invokes (vs current envsubst).
Platform-engineer recommends (a) as the MVP; kubernetes-specialist recommends (b) longer-term. Both eliminate the manual-mirror failure mode.
Layer 3 (longer-term): `seitask scenario validate` subcommand
A schema-validator subcommand that:
Catches more bugs than Layer 1 unit tests because it has access to the full scenario semantics (DAG ordering, var producer/consumer matching), not just the YAML structure.
Source: platform-engineer cross-review on #339.
Relevant experts
Acceptance criteria
This issue resolves when:
Out of scope
References
🤖 Generated with Claude Code