Skip to content

Conversation

@sbernauer
Copy link
Member

@sbernauer sbernauer commented Nov 18, 2025

Description

Fixes #111

Many parts copied from https://github.com/stackabletech/commons-operator/tree/spike/sts-restarter-webhook

This bumps to operator-rs 0.101.1, but we don't have any objectOverrides to implement here.

Co-authored-by: Natalie Klestrup Röijezon nat@nullable.se

Definition of Done Checklist

  • Not all of these items are applicable to all PRs, the author should update this template to only leave the boxes in that are relevant
  • Please make sure all these things are done and tick the boxes

Author

  • Changes are OpenShift compatible
  • CRD changes approved
  • CRD documentation for all fields, following the style guide.
  • Helm chart can be installed and deployed operator works
  • Integration tests passed (for non trivial changes)
  • Changes need to be "offline" compatible
  • Links to generated (nightly) docs added
  • Release note snippet added

Reviewer

  • Code contains useful comments
  • Code contains useful logging statements
  • (Integration-)Test cases added
  • Documentation added or updated. Follows the style guide.
  • Changelog updated
  • Cargo.toml only contains references to git tags (not specific commits or branches)

Acceptance

  • Feature Tracker has been updated
  • Proper release label has been added
  • Links to generated (nightly) docs added
  • Release note snippet added
  • Add type/deprecation label & add to the deprecation schedule
  • Add type/experimental label & add to the experimental features tracker

Release notes

Fixed

  • BREAKING: Prevent an unnecessary restart of Pod 0 of the StatefulSet when the StatefulSet is intitally created.
    This was caused by comons-operator needing to update the StatefulSet immediately after creation, at which point Pod 0 was already created.
    The problem is fixed by utilizing a mutating webhook, which is able to edit the StatefulSet during it's creation, so before Pod 0 is created.
    For that the commons-operator now needs the RBAC permission to create and patch mutatingwebhookconfigurations, which the helm-chart automatically adds.
    The webhook can be disabled using --disable-restarter-mutating-webhook or by setting the DISABLE_RESTARTER_MUTATING_WEBHOOK env variable, but will result in the unnecessary restart again.

@sbernauer sbernauer self-assigned this Dec 29, 2025
@sbernauer sbernauer moved this to Development: Waiting for Review in Stackable Engineering Dec 29, 2025
@sbernauer sbernauer added type/bug release-note/action-required Denotes a PR that introduces potentially breaking changes that require user action. labels Dec 29, 2025
@dervoeti dervoeti self-requested a review January 12, 2026 09:21
@dervoeti dervoeti moved this from Development: Waiting for Review to Development: In Review in Stackable Engineering Jan 12, 2026
dervoeti
dervoeti previously approved these changes Jan 12, 2026
Copy link
Member

@dervoeti dervoeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, LGTM!

My tests worked as expected. One thing I found was that with airflow-operator the behavior is still the same (Pod 0 gets restarted), I believe this is because airflow-operator creates the ConfigMap after the StatefulSet: https://github.com/stackabletech/airflow-operator/blob/39d98ecd67a7bf29753551e01217faefafb53220/rust/operator-binary/src/airflow_controller.rs#L604-L628

So this could be a follow-up PR for airflow-operator. With superset-operator it worked fine.

@sbernauer
Copy link
Member Author

I believe this is because airflow-operator creates the ConfigMap after the StatefulSet

Correct, we will change that :)

Co-authored-by: Lukas Krug <lukas.krug@stackable.tech>
@sbernauer sbernauer requested a review from dervoeti January 12, 2026 15:21
@sbernauer sbernauer added this pull request to the merge queue Jan 13, 2026
Merged via the queue into main with commit ee043d0 Jan 13, 2026
12 checks passed
@sbernauer sbernauer deleted the fix/mutating-webhook branch January 13, 2026 08:18
@sbernauer
Copy link
Member Author

@sbernauer sbernauer moved this from Development: In Review to Development: Done in Stackable Engineering Jan 13, 2026
@lfrancke lfrancke moved this from Development: Done to Acceptance: In Progress in Stackable Engineering Jan 13, 2026
@lfrancke
Copy link
Member

Can you clarify the release notes for me? What is "pod 0"? Why does a mutating webhook help?
I'm not sure if you expand on this in the "parent issue" release notes.
The current sentence doesn't help me understand what's going on at least.
And I'd like to understand what happens when I disable the webhook.

Again: If you have or are planning to address this in the parent then that's fine.

@sbernauer
Copy link
Member Author

I updated the release notes and hope they make it more clear. Please have another look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release/26.3.0 release-note/action-required Denotes a PR that introduces potentially breaking changes that require user action. type/bug

Projects

Status: Acceptance: In Progress

Development

Successfully merging this pull request may close these issues.

StatefulSet restarter always restarts replica 0 immediately after initial rollout

4 participants