refactor(connd): refactor of tasks; modem self-healing; new OES events#1109
refactor(connd): refactor of tasks; modem self-healing; new OES events#1109
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 20874ec307
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| while let Ok(net_state) = net_state_rx.recv_async().await { | ||
| let _ = build_and_send_report(&ctx.nm, &ctx.resolved, net_state, &ctx.zsender) |
There was a problem hiding this comment.
Publish active-connections once at reporter startup
This reporter only reacts to net-state broker events and never emits an initial report on its own. In reporters::spawn, net_state::report is started before this task, so the initial state event can be published before this subscription is ready; if the connection state then stays stable, oes/active_connections is never sent for that boot. This creates a startup race where downstream consumers can remain stale indefinitely.
Useful? React with 👍 / 👎.
| if let Err(e) = refresh_snapshot().await { | ||
| error!("failed to refresh modem snapshot with err: {e}"); | ||
| error!("powercycling modem"); | ||
|
|
||
| let _ = powercycle_modem(ctx.mcu_util.as_ref(), &ctx.systemd) |
There was a problem hiding this comment.
Avoid powercycling on every snapshot refresh failure
Any error from refresh_snapshot() immediately triggers a modem power cycle here, but take_snapshot() fails for generic transient ModemManager/DBus issues (for example list_modems, modem_info, or sim_info errors), not just blacklist conditions. That means short-lived control-plane glitches now cause hard modem resets and connection drops, which is much more disruptive than retrying and can create repeated outage loops.
Useful? React with 👍 / 👎.
| mm.set_current_bands(id, &ALLOWED_BANDS) | ||
| .await | ||
| .map_err(|e| eyre!("could not set modem bands: {e}"))?; |
There was a problem hiding this comment.
Restore preferred modem mode configuration during setup
The new setup path applies signal refresh and allowed bands, but it no longer applies allowed/preferred modes (["3g", "4g"] with preferred "4g") that the previous startup logic configured. After modem replacement/power-cycle, connd now leaves mode selection at modem defaults, which can regress RAT behavior and throughput in the field.
Useful? React with 👍 / 👎.
#1109) this PR refactors task usage in `connd` to rely on `speare` for easier task management, free restart / backoff logic and a free broker for named channels between tasks. it also publishes new data on the OES, and introduces modem self-healing ## new - modem self-healing (powercycle whenever it is blacklisted by modem manager) - `fw_revision` field on `CellularStatus` - `CellularStatus` now published on OES - `ConndReport` and `ActiveConnections` both simplified to publish based on `net-state` event published internally on `speare` broker - `NetStats` now on OES - Datadog reporter now reporting usage for eth / wwan / wlan instead of only wlan - logging of number of wifi profiles on startup to help debug potential issues ## tested on an orb yes ## do not merge [until this PR is merged](#1094) (soon, i just need to unclankerfy it)


this PR refactors task usage in
conndto rely onspearefor easier task management, free restart / backoff logic and a free broker for named channels between tasks.it also publishes new data on the OES, and introduces modem self-healing
new
fw_revisionfield onCellularStatusCellularStatusnow published on OESConndReportandActiveConnectionsboth simplified to publish based onnet-stateevent published internally onspearebrokerNetStatsnow on OEStested on an orb
yes
do not merge
until this PR is merged (soon, i just need to unclankerfy it)