From 90dac53ec8814b5fe902fe95aa715412dab34259 Mon Sep 17 00:00:00 2001 From: Nikolai Emil Damm Date: Fri, 29 May 2026 15:47:50 +0200 Subject: [PATCH] fix(cilium): keep spire-server off the Flux-controller node (soft anti-affinity) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit spire-server is a single replica and the cluster's identity root: if its node fails, every spire-agent loses its upstream (spire-server ClusterIP -> i/o timeout) and Cilium mutual auth degrades cluster-wide. On 2026-05-28 spire-server shared prod-worker-2 with kustomize-controller; when that node's Cilium ClusterIP datapath degraded after an OOMKill, workload identity AND GitOps reconciliation went down together — and reconciliation was exactly what was needed to apply the fix, so the cluster could not self-heal. Add a soft (preferred) podAntiAffinity so spire-server prefers a worker without app.kubernetes.io/part-of=flux pods, decorrelating the identity SPOF from the GitOps controllers. Soft so the single replica always schedules even when every node hosts a Flux pod. Verified the Cilium 1.19.4 chart renders authentication.mutual.spire.install.server.affinity into the StatefulSet. SPIRE is disabled in the Docker overlay, so this is prod-only and inert for local/CI. Co-Authored-By: Claude Opus 4.8 --- .../controllers/cilium/helm-release.yaml | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/k8s/bases/infrastructure/controllers/cilium/helm-release.yaml b/k8s/bases/infrastructure/controllers/cilium/helm-release.yaml index 3c57fc71d..70f883900 100644 --- a/k8s/bases/infrastructure/controllers/cilium/helm-release.yaml +++ b/k8s/bases/infrastructure/controllers/cilium/helm-release.yaml @@ -153,6 +153,25 @@ spec: memory: 128Mi # TODO: Remove workaround when SPIRE no longer fails to start (https://github.com/cilium/cilium/issues/40533) server: + # spire-server is a single replica and the cluster's identity + # root: if its node fails, every spire-agent loses its upstream + # (dial spire-server ClusterIP -> i/o timeout) and Cilium mutual + # auth degrades cluster-wide. Prefer to keep it off whatever + # worker runs the Flux controllers, so a single node loss can't + # take out BOTH workload identity AND GitOps reconciliation at + # once — the combination that turned the 2026-05-28 incident into + # a deadlock (reconciliation was needed to apply the fix, but was + # down on the same failed node). Soft (preferred) so the single + # replica always schedules even when every node hosts a Flux pod. + affinity: + podAntiAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - weight: 100 + podAffinityTerm: + topologyKey: kubernetes.io/hostname + labelSelector: + matchLabels: + app.kubernetes.io/part-of: flux resources: requests: cpu: 50m