Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions k8s/bases/infrastructure/controllers/openbao/helm-release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,26 @@ spec:
server:
enabled: true
replicas: ${openbao_replicas:=1}
# The chart's default readinessProbe is `exec: bao status`, which
# returns exit code 2 on a sealed server -- so the Pod stays NotReady
# until something unseals it. On a fresh cluster that "something" is
# the vault-config Job in the downstream `infrastructure` Flux layer,
# which can't run until this HelmRelease (in `infrastructure-controllers`)
# is Ready. The install therefore deadlocks: Helm waits up to `timeout`,
# `install.remediation.retries: -1` uninstalls + reinstalls, repeat --
# cold-cluster bootstrap historically took 20-40 min waiting for that
# race to resolve (see vault-config Job comment + PR #1636 system-test
# failure).
#
# Switching to the HTTP health probe with `sealedcode=204` and
# `uninitcode=204` makes a sealed-but-running server Ready immediately.
# The chart template renders the httpGet branch as soon as `path` is
# set (server-statefulset.yaml: `{{- if .Values.server.readinessProbe.path }}`)
# and `openbao.scheme` returns "http" when `global.tlsDisable: true`
# (chart default; matches the `tls_disable = 1` listener below).
# The HashiCorp Vault chart uses the same pattern for the same reason.
readinessProbe:
path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204"
# Mount the unseal key Secret (created by the vault-config Job on first
# init, restored by Velero on cluster rebuild) so the postStart hook
# can auto-unseal the server after every pod restart.
Expand Down
Loading