Planner does not re-fire `apply-statefulset` when StatefulSet is deleted post-bootstrap

## Problem

When a SeiNode reaches `phase: Running`, the controller's planner is finished. If the rendered StatefulSet is later deleted (manually for ops reasons, or as part of an aftercare sweep), the SeiNode reconciler does not detect the missing derived resource and re-fire the `apply-statefulset` task. The SeiNode stays in `phase: Running` indefinitely with no live pod.

Live-reproduced today against the `state-size-analyzer` SND in `pacific-1`: deleted the StatefulSet to force a sidecar-image re-render via the platform-default `SEI_SIDECAR_IMAGE` env. Controller never recreated it. Workaround required deleting the SeiNode itself so the SND's `reconcileSeiNodes` would recreate it fresh and the bootstrap plan would re-run from scratch — which also wipes the data PVC (SeiNode-owned), forcing a full state-sync redo.

## Impact

Any ops procedure that deletes the StatefulSet becomes a one-way door — the SND won't bring seid back automatically. The most immediate consumer is the **state-size-analysis CronJob** (queued follow-up to platform `state-size-analyzer.yaml`) which is designed to scale `replicas: 0`, run an analyzer Job against the released PVC, and scale back — that pattern's correctness depends on the controller re-creating the StatefulSet on the scale-up path. Today operators have to know the "delete the SeiNode and accept PVC loss" workaround, which is a footgun: silent state loss for anyone who doesn't realize the cascade.

## Relevant experts

- `kubernetes-specialist` — controller planner + reconcile logic

## Proposed approach

In the SeiNode reconciler, after the bootstrap plan completes (`phase: Running`), continue to assert derived resources exist on each reconcile pass. If the StatefulSet matching `SeiNode.Name` is missing, fire a new plan containing only the post-bootstrap apply tasks (`apply-statefulset`, `apply-service`, `apply-rbac-proxy-config` if TLS is enabled). Do **not** re-fire `discover-peers` / `configure-state-sync` / `config-validate` — those already ran and the existing PVC carries their result. The new plan should be a targeted "rebuild-derived-resources" flow, not a full bootstrap.

## Acceptance criteria

- Deleting the rendered StatefulSet on a Running SeiNode causes the controller to recreate it within one reconcile cycle
- The new StatefulSet is rendered from current `SeiNode.Spec` (so any sidecar-image change since initial bootstrap is picked up via the platform default)
- The data PVC is preserved across the recreate — no state-sync redo
- Existing reconcile paths (chain-upgrade plan, peer re-discovery) remain unaffected

## Out of scope

The related "`ensure-data-pvc` fails terminally when a stale orphaned PVC is present from a just-deleted SeiNode" race condition. That's a separate issue — that code path needs either auto-adoption logic or retry-with-backoff so K8s GC has time to catch up on the orphan. Filing separately if/when it comes up again.

## References

- Live reproduction today on the `pacific-1/state-size-analyzer` SND
- Triggering context: cycling the pod to pick up the platform-default `SEI_SIDECAR_IMAGE` after platform PR #590 removed an inline sidecar-image pin from the SND spec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Planner does not re-fire `apply-statefulset` when StatefulSet is deleted post-bootstrap #284

Problem

Impact

Relevant experts

Proposed approach

Acceptance criteria

Out of scope

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Planner does not re-fire apply-statefulset when StatefulSet is deleted post-bootstrap #284

Description

Problem

Impact

Relevant experts

Proposed approach

Acceptance criteria

Out of scope

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Planner does not re-fire `apply-statefulset` when StatefulSet is deleted post-bootstrap #284