Skip to content

Commit 491b833

Browse files
authored
feat: daemonset maxsurge to prevent unavailability on config changes (#819)
* feat: add maxSurge to DaemonSet rolling update strategy * chore: add TODO to use PreferSameNode once k8s 1.35 is minimum * test: assert DaemonSet rolling update strategy in smoke test * chore: update changelog * chore: lint fixes
1 parent cc6c2e9 commit 491b833

4 files changed

Lines changed: 26 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,13 @@ All notable changes to this project will be documented in this file.
44

55
## [Unreleased]
66

7+
### Changed
8+
9+
- Set `maxSurge=1` and `maxUnavailable=0` on the OPA DaemonSet rolling update strategy to eliminate
10+
availability gaps during rolling updates ([#819]).
11+
12+
[#819]: https://github.com/stackabletech/opa-operator/pull/819
13+
714
## [26.3.0] - 2026-03-16
815

916
## [26.3.0-rc1] - 2026-03-16

rust/operator-binary/src/controller.rs

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ use stackable_operator::{
3434
k8s_openapi::{
3535
DeepMerge,
3636
api::{
37-
apps::v1::{DaemonSet, DaemonSetSpec},
37+
apps::v1::{DaemonSet, DaemonSetSpec, DaemonSetUpdateStrategy, RollingUpdateDaemonSet},
3838
core::v1::{
3939
ConfigMap, EmptyDirVolumeSource, EnvVar, EnvVarSource, HTTPGetAction,
4040
ObjectFieldSelector, Probe, SecretVolumeSource, ServiceAccount,
@@ -1153,6 +1153,13 @@ fn build_server_rolegroup_daemonset(
11531153
..LabelSelector::default()
11541154
},
11551155
template: pod_template,
1156+
update_strategy: Some(DaemonSetUpdateStrategy {
1157+
type_: Some("RollingUpdate".to_string()),
1158+
rolling_update: Some(RollingUpdateDaemonSet {
1159+
max_surge: Some(IntOrString::Int(1)),
1160+
max_unavailable: Some(IntOrString::Int(0)),
1161+
}),
1162+
}),
11561163
..DaemonSetSpec::default()
11571164
};
11581165

rust/operator-binary/src/service.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,12 @@ pub(crate) fn build_server_role_service(
6363
type_: Some(opa.spec.cluster_config.listener_class.k8s_service_type()),
6464
ports: Some(data_service_ports(opa.spec.cluster_config.tls_enabled())),
6565
selector: Some(service_selector_labels.into()),
66+
// This ensures that products (e.g. Trino) on a node always talk to the OPA pod on the
67+
// same node, avoiding cross-node latency. The downside is that if the local OPA pod is
68+
// unavailable, requests fail instead of falling back to another node.
69+
// TODO: Once our minimum supported Kubernetes version is 1.35, use
70+
// `trafficDistribution: PreferSameNode` instead, which prefers the local node but
71+
// gracefully falls back to other nodes if the local pod is unavailable.
6672
internal_traffic_policy: Some("Local".to_string()),
6773
..ServiceSpec::default()
6874
};

tests/templates/kuttl/smoke/10-assert.yaml.j2

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,11 @@ kind: DaemonSet
99
metadata:
1010
name: test-opa-server-default
1111
spec:
12+
updateStrategy:
13+
type: RollingUpdate
14+
rollingUpdate:
15+
maxSurge: 1
16+
maxUnavailable: 0
1217
template:
1318
spec:
1419
containers:

0 commit comments

Comments
 (0)