Emit controller-side counter distinguishing status patch vs no-op

## Problem

The SeiNodeDeployment controller calls `updateStatus` on every reconcile, which generates a merge patch via `client.MergeFromWithOptimisticLock`. In steady state this merge patch is zero-byte (no field-level change, apiserver short-circuits), but there's no Prometheus signal that distinguishes "patch contained changes" from "patch was a no-op." On-call has no way to answer "is this SND being over-stamped right now?" except by `kubectl get -w` and eyeballing `lastTransitionTime` deltas.

## Impact

When a future regression turns steady-state reconciles into actual status writes — slice-ordering drift, a new condition that isn't latched correctly, an out-of-band CRD schema change — the signal is invisible until apiserver audit-log volume or `kube_seinodedeployment_status_condition` metric churn raises alarms hours later. With a controller-emitted counter, the SRE answer is a 30-second PromQL query. This is the operational mirror of the build-time guard in #335.

## Relevant experts

- `opentelemetry-expert` — instrumentation in controller code
- `sre-engineer` — PromQL queries + alerting
- `observability-platform-engineer` — recording rule if needed

## Proposed approach

Instrument `updateStatus` in `internal/controller/nodedeployment/controller.go` to compute the merge-patch body, count by whether it's empty, and emit a Prometheus counter:

```
seinodedeployment_status_patches_total{namespace, name, result="noop|changed"}
```

The patch body is already computed by `client.MergeFrom`'s `Patch()` internally — extract it ahead of the call so we can inspect length. The counter goes through whatever Prometheus registry the controller already uses (controller-runtime's built-in metrics endpoint).

Equivalent counter on the SeiNode controller is a natural sibling — defer until this lands and the pattern is proven.

## Acceptance criteria

- [ ] Counter increments on every reconcile that calls `updateStatus`, labeled by patch outcome
- [ ] PromQL `rate(seinodedeployment_status_patches_total{result="changed"}[5m])` per SND returns a stable low rate in steady state and a measurable spike during legitimate state transitions
- [ ] Metric is registered with the controller-runtime Prometheus registry (no separate scrape endpoint)

## Out of scope

- **Envtest assertion** that steady-state patches are no-ops at the wire level. Filed as #335 — that catches in-tree regressions at build time; this counter catches out-of-band drift in production.
- **Alert rules** on the counter. Separate effort once the metric exists and we know what "abnormal" looks like in practice.
- **Equivalent counter for SeiNode controller.** Follow-up after the pattern is validated here.

## References

- PR #333 — conditions doctrine fold; the always-present condition pattern raises the operational cost of any steady-state status regression
- #335 — companion build-time guard
- Adversarial review on PR #333, reviewer note: *"There is no PromQL query that answers 'did this SND's status get patched in the last 5 min with no observable change?' — that's the missing signal that lets an on-call confirm or deny over-write suspicion in 30 seconds instead of staring at -w for 10 minutes."*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit controller-side counter distinguishing status patch vs no-op #336

Problem

Impact

Relevant experts

Proposed approach

Acceptance criteria

Out of scope

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Emit controller-side counter distinguishing status patch vs no-op #336

Description

Problem

Impact

Relevant experts

Proposed approach

Acceptance criteria

Out of scope

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions