You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The SeiNodeDeployment controller calls updateStatus on every reconcile, which generates a merge patch via client.MergeFromWithOptimisticLock. In steady state this merge patch is zero-byte (no field-level change, apiserver short-circuits), but there's no Prometheus signal that distinguishes "patch contained changes" from "patch was a no-op." On-call has no way to answer "is this SND being over-stamped right now?" except by kubectl get -w and eyeballing lastTransitionTime deltas.
Impact
When a future regression turns steady-state reconciles into actual status writes — slice-ordering drift, a new condition that isn't latched correctly, an out-of-band CRD schema change — the signal is invisible until apiserver audit-log volume or kube_seinodedeployment_status_condition metric churn raises alarms hours later. With a controller-emitted counter, the SRE answer is a 30-second PromQL query. This is the operational mirror of the build-time guard in #335.
Relevant experts
opentelemetry-expert — instrumentation in controller code
sre-engineer — PromQL queries + alerting
observability-platform-engineer — recording rule if needed
Proposed approach
Instrument updateStatus in internal/controller/nodedeployment/controller.go to compute the merge-patch body, count by whether it's empty, and emit a Prometheus counter:
The patch body is already computed by client.MergeFrom's Patch() internally — extract it ahead of the call so we can inspect length. The counter goes through whatever Prometheus registry the controller already uses (controller-runtime's built-in metrics endpoint).
Equivalent counter on the SeiNode controller is a natural sibling — defer until this lands and the pattern is proven.
Acceptance criteria
Counter increments on every reconcile that calls updateStatus, labeled by patch outcome
PromQL rate(seinodedeployment_status_patches_total{result="changed"}[5m]) per SND returns a stable low rate in steady state and a measurable spike during legitimate state transitions
Metric is registered with the controller-runtime Prometheus registry (no separate scrape endpoint)
Problem
The SeiNodeDeployment controller calls
updateStatuson every reconcile, which generates a merge patch viaclient.MergeFromWithOptimisticLock. In steady state this merge patch is zero-byte (no field-level change, apiserver short-circuits), but there's no Prometheus signal that distinguishes "patch contained changes" from "patch was a no-op." On-call has no way to answer "is this SND being over-stamped right now?" except bykubectl get -wand eyeballinglastTransitionTimedeltas.Impact
When a future regression turns steady-state reconciles into actual status writes — slice-ordering drift, a new condition that isn't latched correctly, an out-of-band CRD schema change — the signal is invisible until apiserver audit-log volume or
kube_seinodedeployment_status_conditionmetric churn raises alarms hours later. With a controller-emitted counter, the SRE answer is a 30-second PromQL query. This is the operational mirror of the build-time guard in #335.Relevant experts
opentelemetry-expert— instrumentation in controller codesre-engineer— PromQL queries + alertingobservability-platform-engineer— recording rule if neededProposed approach
Instrument
updateStatusininternal/controller/nodedeployment/controller.goto compute the merge-patch body, count by whether it's empty, and emit a Prometheus counter:The patch body is already computed by
client.MergeFrom'sPatch()internally — extract it ahead of the call so we can inspect length. The counter goes through whatever Prometheus registry the controller already uses (controller-runtime's built-in metrics endpoint).Equivalent counter on the SeiNode controller is a natural sibling — defer until this lands and the pattern is proven.
Acceptance criteria
updateStatus, labeled by patch outcomerate(seinodedeployment_status_patches_total{result="changed"}[5m])per SND returns a stable low rate in steady state and a measurable spike during legitimate state transitionsOut of scope
References