From 90dac53ec8814b5fe902fe95aa715412dab34259 Mon Sep 17 00:00:00 2001
From: Nikolai Emil Damm <nikolaiemildamm@icloud.com>
Date: Fri, 29 May 2026 15:47:50 +0200
Subject: [PATCH] fix(cilium): keep spire-server off the Flux-controller node
 (soft anti-affinity)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

spire-server is a single replica and the cluster's identity root: if its
node fails, every spire-agent loses its upstream (spire-server ClusterIP
-> i/o timeout) and Cilium mutual auth degrades cluster-wide.

On 2026-05-28 spire-server shared prod-worker-2 with kustomize-controller;
when that node's Cilium ClusterIP datapath degraded after an OOMKill,
workload identity AND GitOps reconciliation went down together — and
reconciliation was exactly what was needed to apply the fix, so the
cluster could not self-heal.

Add a soft (preferred) podAntiAffinity so spire-server prefers a worker
without app.kubernetes.io/part-of=flux pods, decorrelating the identity
SPOF from the GitOps controllers. Soft so the single replica always
schedules even when every node hosts a Flux pod. Verified the Cilium
1.19.4 chart renders authentication.mutual.spire.install.server.affinity
into the StatefulSet. SPIRE is disabled in the Docker overlay, so this is
prod-only and inert for local/CI.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 .../controllers/cilium/helm-release.yaml      | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/k8s/bases/infrastructure/controllers/cilium/helm-release.yaml b/k8s/bases/infrastructure/controllers/cilium/helm-release.yaml
index 3c57fc71d..70f883900 100644
--- a/k8s/bases/infrastructure/controllers/cilium/helm-release.yaml
+++ b/k8s/bases/infrastructure/controllers/cilium/helm-release.yaml
@@ -153,6 +153,25 @@ spec:
                   memory: 128Mi
             # TODO: Remove workaround when SPIRE no longer fails to start (https://github.com/cilium/cilium/issues/40533)
             server:
+              # spire-server is a single replica and the cluster's identity
+              # root: if its node fails, every spire-agent loses its upstream
+              # (dial spire-server ClusterIP -> i/o timeout) and Cilium mutual
+              # auth degrades cluster-wide. Prefer to keep it off whatever
+              # worker runs the Flux controllers, so a single node loss can't
+              # take out BOTH workload identity AND GitOps reconciliation at
+              # once — the combination that turned the 2026-05-28 incident into
+              # a deadlock (reconciliation was needed to apply the fix, but was
+              # down on the same failed node). Soft (preferred) so the single
+              # replica always schedules even when every node hosts a Flux pod.
+              affinity:
+                podAntiAffinity:
+                  preferredDuringSchedulingIgnoredDuringExecution:
+                    - weight: 100
+                      podAffinityTerm:
+                        topologyKey: kubernetes.io/hostname
+                        labelSelector:
+                          matchLabels:
+                            app.kubernetes.io/part-of: flux
               resources:
                 requests:
                   cpu: 50m