Skip to content

Add contact_support on update status page#3226

Draft
david-crespo wants to merge 2 commits into
mainfrom
support-banner
Draft

Add contact_support on update status page#3226
david-crespo wants to merge 2 commits into
mainfrom
support-banner

Conversation

@david-crespo
Copy link
Copy Markdown
Collaborator

New boolean contact_support field on update status added in oxidecomputer/omicron#10271.

I tried it inside the properties table as Contact support: Yes and it felt terrible.

Robot notes on the API logic behind contact_support

omicron#10271 adds a contact_support: bool field to the system/update/status API. It is the last piece of a minimal system health check tied to update status, intended as a stopgap until the fault management subsystem lands (RFD 612).

What it means

When contact_support is true, Nexus has detected one or more known conditions in the latest inventory collection (plus a few additional checks) that require Oxide support to resolve. The field collapses several sub-checks into a single boolean because none of the individual conditions are actionable by the customer — the only action is to call support. The detailed breakdown is logged server-side and lands in support bundles.

The intended usage maps to two cases:

  • Before an update: if contact_support is true, the customer should not start an update — resolve the issue with support first.
  • After an update: if contact_support is true, something went wrong; the customer should call support immediately.

Conditions that trigger contact_support: true

  • Unhealthy zpools — any zpool not in online state (e.g., degraded).
  • Enabled SMF services not online — services that should be running but are in maintenance, offline, or degraded.
  • Stuck sagas — sagas that have been running longer than ~15 minutes. (A sample of 10,000 done sagas on dogfood showed only 3 exceeded 15 minutes from creation to completion.)
  • Stale inventory collection — no recent inventory collection (~15 min threshold), meaning Nexus has lost visibility into rack state.
  • Stalled update — an update is supposed to be in progress but the planner hasn't taken a step in ~30 minutes.

The list is explicitly minimal and not exhaustive — contact_support: false does not guarantee the system is fully healthy.

Suppression during an active update

Health checks often fail transiently during an update, so the API suppresses contact_support: true while an update is genuinely in progress. The field only surfaces a true value when either (1) there is no update in progress, or (2) an in-progress update has stalled past the threshold (matching the 10–15 minute guidance) in the Reconfigurator Ops Guide for when support considers an update stuck).

In practice this means the field always presents in one of two contexts: the system is idle (pre-update or post-update), or the update has stalled long enough that the result is no longer a transient artifact.

Issues to resolve

  • Explain the situation without overdoing it
  • Tooltip looks terrible in message box, what if we link to docs instead
  • Should probably link to a way to actually contact support, probably the support email that goes to Zendesk
image

@vercel
Copy link
Copy Markdown

vercel Bot commented May 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
console Ready Ready Preview May 22, 2026 6:06pm

Request Review

@david-crespo david-crespo changed the title Contact support on banner Add contact_support on update status page May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant