feat(cluster-policies): require workloads to spread across nodes by devantler · Pull Request #1661 · devantler-tech/platform

devantler · 2026-05-29T14:39:06Z

What

The spread-pods ClusterPolicy is synced verbatim from upstream kyverno/policies by .github/workflows/sync-cluster-policies.yaml and adapted for this cluster via a Kustomize patch in k8s/bases/infrastructure/cluster-policies/kustomization.yaml. This change hardens that patch so spreading workloads across nodes is strictly required rather than best-effort:

whenUnsatisfiable: ScheduleAnyway → DoNotSchedule — every Deployment/StatefulSet (outside kube-system, carrying an app.kubernetes.io/name label) must spread across nodes (topologyKey: kubernetes.io/hostname, maxSkew: 1).
Added matchLabelKeys: [pod-template-hash].

Why

DoNotSchedule + maxSkew: 1 on its own deadlocks rolling updates: during a rollout the new revision's surge pod is counted against the old revision's pods, so it can't be placed. matchLabelKeys: [pod-template-hash] scopes the skew calculation per ReplicaSet revision, keeping rollouts flowing while enforcement stays strict.

Requires Kubernetes ≥ 1.27 (GA in 1.34). This cluster runs k8s ~1.34 via Talos v1.12.4, so it is on by default.
StatefulSets have no pod-template-hash label; Kubernetes ignores absent keys and falls back to the label selector — fine, since StatefulSet updates roll one pod at a time.

Reviewer notes

The change is confined to the Kustomize patch; the upstream-synced sample (samples/other/spread-pods-across-topology/) is untouched, so it keeps receiving upstream updates for free.
Behavioral impact: pods can now go Pending if they genuinely cannot spread (e.g. a worker is drained/down on the 3-worker local cluster and a multi-replica app needs a distinct node). Single-replica apps are unaffected; rollouts are protected by matchLabelKeys.
Workloads without an app.kubernetes.io/name label are skipped (the policy needs it to group replicas) — unchanged behavior.
Companion insert-pod-antiaffinity policy left soft (preferred) — topology spread is now the hard enforcer.

Validation

ksail workload validate — local + prod, 256 files each, exit 0
kubectl kustomize k8s/clusters/local/ and k8s/clusters/prod/ build cleanly

🤖 Generated with Claude Code

The spread-pods ClusterPolicy (synced from upstream kyverno/policies and patched in the cluster-policies base) previously mutated workloads with a soft topology spread constraint (whenUnsatisfiable: ScheduleAnyway). Harden it to DoNotSchedule so spreading every Deployment/StatefulSet across nodes (topologyKey: kubernetes.io/hostname, maxSkew: 1) is strictly required rather than best-effort. Also add matchLabelKeys: [pod-template-hash] so per-node skew is computed per ReplicaSet revision; without it, DoNotSchedule + maxSkew:1 deadlocks rolling updates because the new revision's surge pod counts against the old revision. Requires k8s >=1.27 (GA in 1.34); the cluster runs k8s ~1.34 via Talos v1.12.4. StatefulSets have no pod-template-hash, so the key is ignored there and falls back to the label selector. Change is confined to the kustomize patch; the synced upstream sample is untouched so it keeps receiving updates. Validated with 'ksail workload validate' (local + prod, 256 files each) and kubectl kustomize builds. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot

Pull request overview

Hardens the local Kustomize patch over the upstream-synced spread-pods ClusterPolicy so workloads are required (not merely encouraged) to spread across nodes, while keeping rollouts unblocked via per-revision skew scoping.

Changes:

Switch injected topologySpreadConstraints[0].whenUnsatisfiable from ScheduleAnyway to DoNotSchedule.
Add matchLabelKeys: [pod-template-hash] so rolling updates don't deadlock under the strict skew rule.
Update the section header comment to reflect the new enforcement mode.

devantler · 2026-05-29T16:06:38Z

🧪 System Test failure — analysis: pre-existing environmental flake, not caused by this change

TL;DR: The System Test failure is unrelated to this PR's topology-spread change. It was tripped by the strict FAIL_ON_WARNING "new event warnings" gate catching a single app cold-start readiness warning, and the same System Test is currently red across several unrelated merge-queue PRs in the same window.

What actually failed the build (job log):

##[error]1 distinct Warning event(s) still firing at steady state (by reason, ×=occurrences): Unhealthy ×2.
New Warning events since marker (1):
  [actual-budget] Pod/actual-budget-...  Unhealthy (x2): Readiness probe failed:
  Get "http://10.244.2.105:5006/": connect: connection refused

The Check for new event warnings step (FAIL_ON_WARNING: true) failed because actual-budget — a KEDA scale-to-zero app — was still cold-starting (process not yet listening on :5006) when the post-deploy warning sample was taken. "connection refused" is an app-readiness timing issue, not a scheduling one.

Why it isn't this change:

This PR only flips the spread-pods mutation from ScheduleAnyway → DoNotSchedule and adds matchLabelKeys: [pod-template-hash]. The only failure mode that introduces is Pending pods / topology FailedScheduling events — there are none in the run. The policy applied cleanly (ClusterPolicy/spread-pods created) and actual-budget itself scheduled fine (it has a pod IP and is Running; it just wasn't Ready yet).
The Kyverno-webhook timeout / Cilium-CNI / Hubble-CA errors elsewhere in the log are in the report-only history section, not the gate that failed — they're this cluster's known Cilium bootstrap instability (cf. fix(cilium,spire): set resource requests to promote critical agents out of BestEffort QoS #1649).
The same ci.yaml System Test is concurrently failing on unrelated merge-queue PRs (feat(openbao): enable file audit device, shorten DB role rotation to 7 d #1626, fix(apps): add startupProbe to homepage, headlamp, actual-budget #1636, fix(openbao): switch readinessProbe to HTTP /sys/health to unblock bootstrap #1656, feat(rbac): scope OIDC kubectl/Headlamp access to read-only #1657, fix(longhorn): run CSI attacher/provisioner/resizer with 2 replicas for HA #1658) in the same time window → environmental, not change-specific.

Action: re-running the System Test. No manifest change is warranted (per the repo convention of not pushing code for infra-only failures). If actual-budget's cold-start keeps tripping the steady-state warning gate, that's a separate pre-existing item (its readiness/startup timing vs. the gate's sample window) and can be tracked on its own.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 29, 2026 14:39

github-project-automation Bot added this to 🌊 Project Board May 29, 2026

github-project-automation Bot moved this to 🫴 Ready in 🌊 Project Board May 29, 2026

Copilot started reviewing on behalf of devantler May 29, 2026 14:39 View session

devantler temporarily deployed to ci May 29, 2026 14:39 — with GitHub Actions Inactive

devantler marked this pull request as ready for review May 29, 2026 14:43

Copilot AI reviewed May 29, 2026

View reviewed changes

Merge branch 'main' into claude/friendly-poincare-67f404

72cb299

botantler Bot approved these changes May 29, 2026

View reviewed changes

botantler Bot enabled auto-merge May 29, 2026 15:32

devantler had a problem deploying to ci May 29, 2026 15:32 — with GitHub Actions Failure

devantler temporarily deployed to ci May 29, 2026 18:58 — with GitHub Actions Inactive

botantler Bot added this pull request to the merge queue May 29, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cluster-policies): require workloads to spread across nodes#1661

feat(cluster-policies): require workloads to spread across nodes#1661
devantler wants to merge 2 commits into
mainfrom
claude/friendly-poincare-67f404

devantler commented May 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

devantler commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

devantler commented May 29, 2026

What

Why

Reviewer notes

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

devantler commented May 29, 2026

🧪 System Test failure — analysis: pre-existing environmental flake, not caused by this change

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants