feat(cluster-policies): require workloads to spread across nodes#1661
feat(cluster-policies): require workloads to spread across nodes#1661devantler wants to merge 2 commits into
Conversation
The spread-pods ClusterPolicy (synced from upstream kyverno/policies and patched in the cluster-policies base) previously mutated workloads with a soft topology spread constraint (whenUnsatisfiable: ScheduleAnyway). Harden it to DoNotSchedule so spreading every Deployment/StatefulSet across nodes (topologyKey: kubernetes.io/hostname, maxSkew: 1) is strictly required rather than best-effort. Also add matchLabelKeys: [pod-template-hash] so per-node skew is computed per ReplicaSet revision; without it, DoNotSchedule + maxSkew:1 deadlocks rolling updates because the new revision's surge pod counts against the old revision. Requires k8s >=1.27 (GA in 1.34); the cluster runs k8s ~1.34 via Talos v1.12.4. StatefulSets have no pod-template-hash, so the key is ignored there and falls back to the label selector. Change is confined to the kustomize patch; the synced upstream sample is untouched so it keeps receiving updates. Validated with 'ksail workload validate' (local + prod, 256 files each) and kubectl kustomize builds. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Hardens the local Kustomize patch over the upstream-synced spread-pods ClusterPolicy so workloads are required (not merely encouraged) to spread across nodes, while keeping rollouts unblocked via per-revision skew scoping.
Changes:
- Switch injected
topologySpreadConstraints[0].whenUnsatisfiablefromScheduleAnywaytoDoNotSchedule. - Add
matchLabelKeys: [pod-template-hash]so rolling updates don't deadlock under the strict skew rule. - Update the section header comment to reflect the new enforcement mode.
🧪 System Test failure — analysis: pre-existing environmental flake, not caused by this changeTL;DR: The System Test failure is unrelated to this PR's topology-spread change. It was tripped by the strict What actually failed the build (job log): The Why it isn't this change:
Action: re-running the System Test. No manifest change is warranted (per the repo convention of not pushing code for infra-only failures). If 🤖 Generated with Claude Code |
What
The
spread-podsClusterPolicy is synced verbatim from upstreamkyverno/policiesby.github/workflows/sync-cluster-policies.yamland adapted for this cluster via a Kustomize patch ink8s/bases/infrastructure/cluster-policies/kustomization.yaml. This change hardens that patch so spreading workloads across nodes is strictly required rather than best-effort:whenUnsatisfiable: ScheduleAnyway→DoNotSchedule— everyDeployment/StatefulSet(outsidekube-system, carrying anapp.kubernetes.io/namelabel) must spread across nodes (topologyKey: kubernetes.io/hostname,maxSkew: 1).matchLabelKeys: [pod-template-hash].Why
DoNotSchedule+maxSkew: 1on its own deadlocks rolling updates: during a rollout the new revision's surge pod is counted against the old revision's pods, so it can't be placed.matchLabelKeys: [pod-template-hash]scopes the skew calculation per ReplicaSet revision, keeping rollouts flowing while enforcement stays strict.v1.12.4, so it is on by default.pod-template-hashlabel; Kubernetes ignores absent keys and falls back to the label selector — fine, since StatefulSet updates roll one pod at a time.Reviewer notes
samples/other/spread-pods-across-topology/) is untouched, so it keeps receiving upstream updates for free.Pendingif they genuinely cannot spread (e.g. a worker is drained/down on the 3-worker local cluster and a multi-replica app needs a distinct node). Single-replica apps are unaffected; rollouts are protected bymatchLabelKeys.app.kubernetes.io/namelabel are skipped (the policy needs it to group replicas) — unchanged behavior.insert-pod-antiaffinitypolicy left soft (preferred) — topology spread is now the hard enforcer.Validation
ksail workload validate— local + prod, 256 files each, exit 0kubectl kustomize k8s/clusters/local/andk8s/clusters/prod/build cleanly🤖 Generated with Claude Code