Skip to content

fix(runner): stamp Workflow ownerRef on applied SeiNodeTask CRs#350

Merged
bdchatham merged 1 commit into
mainfrom
fix/runner-stamp-workflow-ownerref
May 21, 2026
Merged

fix(runner): stamp Workflow ownerRef on applied SeiNodeTask CRs#350
bdchatham merged 1 commit into
mainfrom
fix/runner-stamp-workflow-ownerref

Conversation

@bdchatham
Copy link
Copy Markdown
Collaborator

@bdchatham bdchatham commented May 21, 2026

Summary

Closes the last gap in the workflow ownership chain. Before this change, the seitask runner subcommand applied SeiNodeTask CRs via SSA but never stamped an ownerRef pointing at the parent Workflow — so per-run SNTs survived workflow deletion and accumulated indefinitely across nightly runs.

keygen, provision-snd, and taskruntime.EnsureWorkflowVarsCM already stamp the Workflow ownerRef on what they create. This PR brings runner in line with that pattern:

  • Add OwnerRef *metav1.OwnerReference to runner.DefaultRenderer.
  • In cmd/seitask/runner.go, call taskruntime.LoadWorkflowIdentity(ctx, cliClient) at subcommand startup; pass &wf.OwnerRef() into the renderer.
  • In RenderBytes, replace (not merge) metadata.ownerReferences so a template-smuggled bogus ref can't leak through — mirrors provisionsnd.stampMetadata exactly.

Result: kubectl delete workflow major-upgrade-<run-id> (or the gc-cronjob age sweep at 24h) cascades to every per-run resource — SND, workflow-vars CM, admin Secret, SeiNodeTasks, Task pods.

Test plan

  • New TestRenderBytes_StampsOwnerRef covers both the stamping path (single Workflow ref replaces any template-declared refs) and the nil-ownerRef path (template-declared refs preserved for backward compat).
  • go test ./... passes.
  • Verify on the next nightly fire that newly-applied SNTs carry a Workflow ownerRef (kubectl get snt -o yaml | grep -A3 ownerReferences).

Related

  • Pairs with sei-protocol/platform#634 — once this lands and the seitask image rebuilds, the platform gc-cronjob will sweep just Workflows and cascade handles the rest.

🤖 Generated with Claude Code

The runner subcommand was the last seitask command not stamping the
parent Workflow's ownerRef on what it creates. provision-snd, keygen,
and EnsureWorkflowVarsCM already do; runner-applied SeiNodeTask CRs
silently leaked across runs because no ownerRef linked them to the
Workflow lifecycle.

Add an OwnerRef field to DefaultRenderer; populate it from
taskruntime.LoadWorkflowIdentity at subcommand startup; replace (not
merge) ownerReferences on the rendered manifest before SSA. Mirrors
provisionsnd.stampMetadata exactly so a template-smuggled bogus ref
can't leak through.

With this in place, deleting the Workflow CR cascades the per-run
SeiNodeTask CRs along with the existing SND + workflow-vars CM + admin
Secret. The platform-side gc-cronjob can sweep just Workflows; cascade
reaps the rest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cursor
Copy link
Copy Markdown

cursor Bot commented May 21, 2026

PR Summary

Medium Risk
Medium risk because it changes Kubernetes metadata (ownerReferences) for all runner-applied tasks, affecting lifecycle/GC behavior and potentially overriding template-provided refs when enabled.

Overview
seitask runner now loads the parent Workflow identity at startup and stamps its ownerReference onto rendered/applied SeiNodeTask manifests, enabling cascading deletion when the workflow is removed.

runner.DefaultRenderer and RenderBytes were extended to accept an optional owner ref; when provided, rendering replaces (not merges) metadata.ownerReferences to prevent template-smuggled refs. Tests were updated and a new case added to verify both stamping and the nil/no-stamp path.

Reviewed by Cursor Bugbot for commit 906e520. Bugbot is set up for automated code reviews on this repo. Configure here.

@bdchatham bdchatham merged commit cd33795 into main May 21, 2026
5 checks passed
bdchatham added a commit that referenced this pull request May 22, 2026
… on runner Task containers (#351)

#350 made taskruntime.LoadWorkflowIdentity mandatory at runner startup
so applied SeiNodeTask CRs get a Workflow ownerRef. LoadWorkflowIdentity
reads SEI_WORKFLOW_NAME + SEI_NAMESPACE from env (downward API).

The keygen/provision-snd/upload-report Task containers already project
these via downward API, but the 12 `runner` Task containers in
scenarios/major-upgrade.yaml did NOT — pre-#350 the runner didn't need
them. Result: every runner step in major-upgrade exits 2 with
`seitask: infra: downward-API env not projected: [SEI_WORKFLOW_NAME
SEI_NAMESPACE]` before doing any work. The gov pipeline silently
failed end-to-end on the first post-#350 fire even though chaos-mesh
reported the Workflow Accomplished (because chaos-mesh Accomplished
doesn't mean container exit 0).

Add the env block to all 12 runner Task containers. scenarios/release-test
and scenarios/load-test don't invoke `runner`, so they're unaffected.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant