From 936c98e7f3275ccecb16dceabc4b72bd0540a66c Mon Sep 17 00:00:00 2001 From: Nikolai Emil Damm Date: Thu, 28 May 2026 13:04:01 +0200 Subject: [PATCH 1/7] feat(openbao): enable file audit device, shorten DB role rotation to 7 d MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two related OpenBao hardenings. 1. Enable the file audit device on the auditStorage PV. The OpenBao chart provisions `auditStorage.enabled: true` (mounted at /vault/audit per chart defaults), but the vault-config Job never ran `bao audit enable` -- so the PV exists, is mounted, and is empty. Every read of a Secret to date has been unrecorded. Adds an idempotent block (section 7 in the Job) that runs: bao audit enable file file_path=/vault/audit/audit.log guarded by a `bao audit list | grep -q file/` check so re-runs don't error. Now every OpenBao API call writes one JSON record per request. The audit log can be tailed from the openbao pod today, and shipped to Loki by a sidecar / promtail once the observability stack lands (per the observability-production-ready memory). Note: OpenBao blocks all writes if its audit log path is unwritable (HashiCorp Vault behavioural compatibility). The PV mount handles this in practice; the chart-managed PV is bound for the lifetime of the StatefulSet. 2. Shorten the fleetdm MySQL static-role rotation default from 90 d to 7 d. The 2160h default was chosen during initial bootstrap as a low-churn value. With ESO + Reloader already validated to pick up rotations without disruption, 168h (7 d) is HashiCorp's documented sweet spot: long enough that ESO cache misses are rare, short enough that a leaked credential's half-life is bounded. Operator override is preserved via the fleetdm_mysql_rotation_period cluster variable; only the *default* shifts. 3. Doc-only: top-of-file step list grows to cover step 6 (OIDC), step 7 (audit), step 8 (DB engine) -- the steps existed but the doc block was stale. Validation: $ ksail workload validate → 255 files validated $ ksail --config ksail.prod.yaml workload validate → 255 files validated Deferred from this PR (left for separate work): - Transit engine for app-layer encryption (no consumer in-flight to drive the design; would be premature). - Audit-log shipping to Loki (depends on the observability rollout). Closes Phase 3.4 (partial — audit + rotation) of the public-repo hardening series. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../infrastructure/vault-config/job.yaml | 29 +++++++++++++++++-- 1 file changed, 27 insertions(+), 2 deletions(-) diff --git a/k8s/bases/infrastructure/vault-config/job.yaml b/k8s/bases/infrastructure/vault-config/job.yaml index 6a990479a..5d2564dc8 100644 --- a/k8s/bases/infrastructure/vault-config/job.yaml +++ b/k8s/bases/infrastructure/vault-config/job.yaml @@ -10,6 +10,9 @@ # 3. Configures Kubernetes auth (in-cluster auto-discovery for CA + token reviewer) # 4. Creates least-privilege policies # 5. Creates auth roles mapping ServiceAccounts to policies +# 6. Configures OIDC auth (Dex) for human admin access +# 7. Enables the file audit device (writes to the auditStorage PV) +# 8. Configures the Database secrets engine for fleetdm MySQL rotation # # On fresh install the init containers auto-initialize the vault and create # the openbao-unseal Secret. On Velero restore the Secret and PVC are both @@ -434,7 +437,21 @@ spec: echo "It will be configured on the next vault-config reconciliation." fi - # --- 7. Database secrets engine: fleetdm MySQL static-role rotation --- + # --- 7. Audit device: file backend to the auditStorage PV --- + # OpenBao writes one JSON record per request to /vault/audit/ + # audit.log (the mount provisioned by auditStorage.enabled in + # the HelmRelease). Tail this file from the openbao pod, or + # ship it via a sidecar / promtail when the observability + # stack lands. Without this device, OpenBao has zero audit + # trail -- any read of a Secret goes unrecorded. + if ! bao audit list -format=json 2>/dev/null | grep -q '"file/"'; then + echo "Enabling file audit device..." + bao audit enable file file_path=/vault/audit/audit.log + else + echo "File audit device already enabled." + fi + + # --- 8. Database secrets engine: fleetdm MySQL static-role rotation --- # OpenBao owns and periodically rotates the 'fleet' MySQL user's # password; ESO reads the current value via the VaultDynamicSecret # generator in the fleetdm namespace. @@ -467,10 +484,18 @@ spec: # re-rotating on every idempotent Job re-run. App user only, # never root. if ! bao read database/static-roles/fleet >/dev/null 2>&1; then + # Default rotation: 168h (7 days). Was 2160h (90 days) + # which gave a leaked credential a 90-day half-life. + # 7 days matches HashiCorp's documented sweet spot for + # interactive workloads (long enough that cache misses + # are rare, short enough that a leak is bounded). + # Overridable via the fleetdm_mysql_rotation_period + # cluster variable for fork operators with different + # constraints. bao write database/static-roles/fleet \ db_name=fleetdm-mysql \ username="fleet" \ - rotation_period="${fleetdm_mysql_rotation_period:=2160h}" + rotation_period="${fleetdm_mysql_rotation_period:=168h}" echo "fleetdm MySQL static role created." else echo "fleetdm MySQL static role already exists." From 1ade5f5f71b6820d7b4710ce5adf63f8377de08d Mon Sep 17 00:00:00 2001 From: Nikolai Emil Damm Date: Thu, 28 May 2026 21:55:08 +0200 Subject: [PATCH 2/7] fix(openbao): point audit device at /openbao/audit (chart default) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The audit-enable block I added in the previous commit pointed at /vault/audit/audit.log — the upstream HashiCorp Vault chart's convention. The OpenBao chart instead standardised on /openbao/* paths to match its rebrand (the data path is /openbao/data, the auditStorage mount defaults to /openbao/audit). The vault-config Job's bao CLI runs against the server pod, and the file_path is resolved on the *server* pod's filesystem — so the wrong path caused the audit-enable step to fail (no such directory), and the Job hit CrashLoopBackOff because the script aborts on error. The Flux Kustomization 'infrastructure' then could not reconcile because its health check waited indefinitely for the Job to reach Complete. CI log excerpt: openbao vault-config-mcbbr 0/1 Error 5 openbao vault-config-q92kx 0/1 CrashLoopBackOff 10 Kustomization/flux-system/infrastructure timeout waiting for: [Job/openbao/vault-config status: 'InProgress'] Fix: change file_path to /openbao/audit/audit.log. Comment block explains the /openbao vs /vault path convention so the next operator doesn't re-introduce the bug. The chart's auditStorage PV already provisions and mounts the directory; no chart-level changes needed. Reviewed-by: Copilot review on PR #1626 (it caught this before CI did). Co-Authored-By: Claude Opus 4.7 (1M context) --- k8s/bases/infrastructure/vault-config/job.yaml | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/k8s/bases/infrastructure/vault-config/job.yaml b/k8s/bases/infrastructure/vault-config/job.yaml index 5373e6fe4..64cd58275 100644 --- a/k8s/bases/infrastructure/vault-config/job.yaml +++ b/k8s/bases/infrastructure/vault-config/job.yaml @@ -454,15 +454,18 @@ spec: fi # --- 7. Audit device: file backend to the auditStorage PV --- - # OpenBao writes one JSON record per request to /vault/audit/ - # audit.log (the mount provisioned by auditStorage.enabled in - # the HelmRelease). Tail this file from the openbao pod, or + # OpenBao writes one JSON record per request to /openbao/audit/ + # audit.log. That directory is the auditStorage PV mount path + # provisioned by the chart -- /openbao/audit pairs with the + # /openbao/data data mount (the OpenBao chart standardised on + # /openbao/* paths rather than the upstream HashiCorp Vault + # /vault/* paths). Tail this file from the openbao pod, or # ship it via a sidecar / promtail when the observability # stack lands. Without this device, OpenBao has zero audit # trail -- any read of a Secret goes unrecorded. if ! bao audit list -format=json 2>/dev/null | grep -q '"file/"'; then echo "Enabling file audit device..." - bao audit enable file file_path=/vault/audit/audit.log + bao audit enable file file_path=/openbao/audit/audit.log else echo "File audit device already enabled." fi From 685bbfe2db057f40c81fd167565f7af3bccd12dd Mon Sep 17 00:00:00 2001 From: Nikolai Emil Damm Date: Thu, 28 May 2026 22:58:28 +0200 Subject: [PATCH 3/7] fix(openbao): declare file audit device in HCL (API enable is blocked) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit CI on this branch was failing with: Error enabling audit device: Error making API request. URL: PUT http://openbao.openbao.svc.cluster.local:8200/v1/sys/audit/file Code: 400. Errors: * cannot enable audit device via API; use declarative, config-based audit device management instead OpenBao does not allow enabling the audit device at runtime via the sys/audit API -- it requires the device to be declared in the server's HCL config alongside listener/storage. The vault-config Job's 'bao audit enable' call was therefore wrong by design and would never have worked against this OpenBao build. Fix: 1. openbao HelmRelease (standalone.config): add a declarative audit "file" { file_path = "/openbao/audit/audit.log" } stanza. /openbao/audit is the chart's auditStorage PV mount path (matches the /openbao/data data path). OpenBao reads this on startup; no API call needed. Every API request is logged to /openbao/audit/audit.log as one JSON record per line. 2. vault-config Job: drop the now-dead 'bao audit enable' block. Replace it with a comment explaining why this is declarative-only. Renumber the trailing 'Database secrets engine' section from 8 -> 7 in both the body and the top-of-file step list. The previous commit (1ade5f5f) fixed the path from /vault to /openbao based on the chart default; this commit moves the configuration to the correct place (HCL config) so it actually takes effect. Validation: $ ksail workload validate → 256 files validated $ ksail --config ksail.prod.yaml workload validate → 256 files validated Co-Authored-By: Claude Opus 4.7 (1M context) --- .../controllers/openbao/helm-release.yaml | 13 +++++++ .../infrastructure/vault-config/job.yaml | 36 +++++++++---------- 2 files changed, 29 insertions(+), 20 deletions(-) diff --git a/k8s/bases/infrastructure/controllers/openbao/helm-release.yaml b/k8s/bases/infrastructure/controllers/openbao/helm-release.yaml index b9658d150..1b5f12824 100644 --- a/k8s/bases/infrastructure/controllers/openbao/helm-release.yaml +++ b/k8s/bases/infrastructure/controllers/openbao/helm-release.yaml @@ -85,6 +85,19 @@ spec: storage "file" { path = "/openbao/data" } + + # Declarative file audit device on the auditStorage PV. OpenBao + # blocks runtime audit enables: `bao audit enable file …` + # returns "cannot enable audit device via API; use declarative, + # config-based audit device management instead", so the device + # MUST be declared here in HCL. The mount path /openbao/audit + # is the chart's auditStorage default. Every API request is + # written to /openbao/audit/audit.log as one JSON record per + # line; tail it from the openbao pod today, ship via promtail + # once the observability stack lands. + audit "file" { + file_path = "/openbao/audit/audit.log" + } topologySpreadConstraints: - maxSkew: 1 topologyKey: kubernetes.io/hostname diff --git a/k8s/bases/infrastructure/vault-config/job.yaml b/k8s/bases/infrastructure/vault-config/job.yaml index 64cd58275..f4d7ccaee 100644 --- a/k8s/bases/infrastructure/vault-config/job.yaml +++ b/k8s/bases/infrastructure/vault-config/job.yaml @@ -11,8 +11,11 @@ # 4. Creates least-privilege policies # 5. Creates auth roles mapping ServiceAccounts to policies # 6. Configures OIDC auth (Dex) for human admin access -# 7. Enables the file audit device (writes to the auditStorage PV) -# 8. Configures the Database secrets engine for fleetdm MySQL rotation +# 7. Configures the Database secrets engine for fleetdm MySQL rotation +# +# The file audit device is declared in the openbao HelmRelease config +# (declarative-only — OpenBao rejects runtime audit enables via API), +# so no runtime step is needed here for auditing. # # On fresh install the init containers auto-initialize the vault and create # the openbao-unseal Secret. On Velero restore the Secret and PVC are both @@ -453,24 +456,17 @@ spec: echo "It will be configured on the next vault-config reconciliation." fi - # --- 7. Audit device: file backend to the auditStorage PV --- - # OpenBao writes one JSON record per request to /openbao/audit/ - # audit.log. That directory is the auditStorage PV mount path - # provisioned by the chart -- /openbao/audit pairs with the - # /openbao/data data mount (the OpenBao chart standardised on - # /openbao/* paths rather than the upstream HashiCorp Vault - # /vault/* paths). Tail this file from the openbao pod, or - # ship it via a sidecar / promtail when the observability - # stack lands. Without this device, OpenBao has zero audit - # trail -- any read of a Secret goes unrecorded. - if ! bao audit list -format=json 2>/dev/null | grep -q '"file/"'; then - echo "Enabling file audit device..." - bao audit enable file file_path=/openbao/audit/audit.log - else - echo "File audit device already enabled." - fi - - # --- 8. Database secrets engine: fleetdm MySQL static-role rotation --- + # NOTE: the file audit device used to be enabled here at + # runtime. OpenBao rejects that path -- `bao audit enable` + # returns "cannot enable audit device via API; use + # declarative, config-based audit device management + # instead". The audit device is now declared in the + # openbao HelmRelease's standalone.config HCL (a + # `audit "file" { file_path = "/openbao/audit/audit.log" }` + # stanza alongside listener and storage). No runtime step + # is needed for OpenBao to start auditing. + + # --- 7. Database secrets engine: fleetdm MySQL static-role rotation --- # OpenBao owns and periodically rotates the 'fleet' MySQL user's # password; ESO reads the current value via the VaultDynamicSecret # generator in the fleetdm namespace. From 35f0210acca1d42985fd299a32bf65009df0e46c Mon Sep 17 00:00:00 2001 From: Nikolai Emil Damm Date: Thu, 28 May 2026 23:04:04 +0200 Subject: [PATCH 4/7] fix(openbao): bump audit PVC to 10Gi + document fail-closed mode MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit OpenBao's file audit backend does not rotate, and OpenBao fails CLOSED on audit-write errors (every API request blocks once the volume is full). The chart default of 1Gi would silently degrade to a fully sealed cluster after a few months at this cluster's request volume. Changes: - auditStorage.size: 1Gi -> 10Gi. 10Gi gives multi-year headroom for this cluster's traffic (~700 KB/day from current ESO + vault-snapshot use). Variable override matches the dataStorage idiom so fork operators can tune per-cluster. - Inline comment documents: * the failure mode (fail-closed, blocks API); * the rotation strategy until the observability stack ships the audit stream off-PVC (a manual SIGHUP rotate from the openbao pod); * the metric to monitor while we're still file-backed. This is a tactical sizing/documentation fix. Proper rotation + shipping happens in the observability rollout (per the observability-production-ready memory) -- promtail will consume audit.log and the PVC sizing becomes irrelevant. Tracked as a follow-up to this PR. Validation: $ ksail workload validate → 256 files validated $ ksail --config ksail.prod.yaml workload validate → 256 files validated Co-Authored-By: Claude Opus 4.7 (1M context) --- .../controllers/openbao/helm-release.yaml | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/k8s/bases/infrastructure/controllers/openbao/helm-release.yaml b/k8s/bases/infrastructure/controllers/openbao/helm-release.yaml index 1b5f12824..34aa9ad6e 100644 --- a/k8s/bases/infrastructure/controllers/openbao/helm-release.yaml +++ b/k8s/bases/infrastructure/controllers/openbao/helm-release.yaml @@ -63,7 +63,21 @@ spec: size: ${openbao_storage_size:=1Gi} auditStorage: enabled: true - size: 1Gi + # 10Gi (up from the chart default 1Gi). The file audit backend + # does not rotate, and OpenBao FAILS CLOSED on audit-write + # errors (every API request blocks once the volume is full). + # 10Gi gives multi-year headroom for this cluster's request + # volume (~700 KB/day from current ESO + vault-snapshot + # traffic). Once the observability stack lands, promtail will + # ship the stream off-PVC and the size will become irrelevant. + # Until then, monitor 'kubelet_volume_stats_available_bytes' + # on the openbao-audit-* PVC and rotate manually via + # kubectl -n openbao exec openbao-0 -- sh -c \ + # 'mv /openbao/audit/audit.log /openbao/audit/audit.log.$(date -u +%Y%m%dT%H%M%SZ) && \ + # kill -HUP 1' + # (OpenBao reopens its audit FD on SIGHUP, so the move + + # signal pattern is safe.) + size: ${openbao_audit_storage_size:=10Gi} standalone: enabled: true config: | From f9dc0868622bc41d65d9f49317bd395f65754a2d Mon Sep 17 00:00:00 2001 From: Nikolai Emil Damm Date: Fri, 29 May 2026 00:05:42 +0200 Subject: [PATCH 5/7] fix(openbao): correct HCL audit block syntax (type + options) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit OpenBao parsed the audit stanza I added in 685bbfe2 and rejected it: error loading configuration from /tmp/storageconfig.hcl: error parsing 'audit': audit.0: audit type must be specified I had written it as if 'file' were the audit type: audit "file" { file_path = "/openbao/audit/audit.log" } But per the OpenBao docs (https://openbao.org/docs/configuration/audit) the label after 'audit' is an arbitrary identifier (it becomes the device's path under /sys/audit/