Skip to content

Telegram operator bot: alerts + on-demand commands, over Tor (#121, #45)#341

Open
VijitSingh97 wants to merge 11 commits into
developfrom
claude/elegant-ardinghelli-3fa5fd
Open

Telegram operator bot: alerts + on-demand commands, over Tor (#121, #45)#341
VijitSingh97 wants to merge 11 commits into
developfrom
claude/elegant-ardinghelli-3fa5fd

Conversation

@VijitSingh97

Copy link
Copy Markdown
Collaborator

Closes #121, closes #45. Supersedes the stale draft #143 (its three modules were salvaged and rebased here).

A Telegram operator bot for the stack: push alerts for anything worth knowing, plus read-only status commands on demand. Off by default; everything routes over Tor.

Alerts (debounced, one message per real transition)

🚀 Pithead online · ⛓️ node down/recovered · ⛏️ worker offline/back · 🎉 worker joined · 👋 worker left · ✅ sync finished · 💾 disk filling/critical · 🗄️ DB write failing · 🎰 no PPLNS share (XvB wins skipped) · 🎰 XvB registration rejected/failing · 🌐 node exposed on clearnet · 🆕 new release available

Each reuses a signal the dashboard already computes (the build_badges catalog / Metrics / NodeHealthMonitor) — no re-collection. Per-event telegram.events.* toggles, all default on.

Commands (read-only; only the configured chat_id is answered)

/status · /hashrate · /workers · /sync · /system · /pool · /xvb · /earnings · /help

Long-poll (getUpdates), so no inbound port; replies come from the same build_metrics the web UI renders, so they always match the dashboard.

Design

  • Over Tor. Notifier + command bot dial api.telegram.org through the bridge Tor SOCKS proxy (socks5h), like Healthchecks/XvB — never leaks the host IP.
  • Read-only. No command changes the stack (lifecycle stays on the CLI), so a leaked chat can at worst read status.
  • Worker offline = the DOWN status the dashboard shows (not list-absence); joined/left track fleet membership. All edges seed silently so a restart never replays.
  • Secret handling. bot_token lives in the owner-only .env, masked in the apply preview, never logged.
  • docs/architecture.md diagram now shows every dashboard egress path (Telegram/Healthchecks/XvB/GitHub), all 🟢 Tor.

Testing

make test green — dashboard 769, stack 452, selftest 101, fakes 12. Patch coverage 96%, full lint clean, test-inventory regenerated. Validated live on gouda: rebuilt via upgrade, bot polls over Tor, getMe/sendMessage succeed over Tor, stack healthy.

Follow-ups (kept as issues — need new infra)

#336 block/payout alerts · #337 container crash-loop alerts · #338 two-way control commands · #339 hashrate-low-for-tier alert.

🤖 Generated with Claude Code

VijitSingh97 and others added 8 commits June 4, 2026 02:17
Ship a thin, notifications-only Telegram pusher for v1.0: node down/recovered,
worker offline/back online, and sync finished. Off by default; no interactive
bot (that stays in #45).

Consumes signals the data loop already computes rather than re-collecting:
- node down/recovered from NodeHealthMonitor's debounced `down` edge (#31)
- sync finished from the sync-gate `miner_released` latch (#35)
- worker offline/back via a new flap-protected per-worker presence tracker

New dashboard modules:
- telegram_notifier.py: thin sendMessage transport; enabled only with token +
  chat_id; fail-silent on offline/Tor-only hosts; never logs the bot token.
- worker_presence.py: WorkerPresenceMonitor, the per-worker analogue of
  NodeHealthMonitor (debounce + recovery hysteresis + silent baseline + reset
  when the proxy is intentionally stopped).
- alert_service.py: folds per-cycle signals into debounced alerts; pure
  evaluate() + off-thread process(); wired into data_service.run().

Plumbing: config.json telegram.* -> pithead render_env -> per-event env vars ->
config.py. bot_token rendered to the owner-only .env and masked in the apply
preview. Injected into the dashboard container in docker-compose; added to the
advanced example config.

Docs: new docs/telegram.md setup guide (BotFather, chat id, per-event toggles,
"one chat, two bots" with #79, Tor-only caveat, troubleshooting); cross-refs in
the docs index, configuration reference, and CHANGELOG.

Tests: 49 new pytest cases (notifier/monitor/alert service), plus stack tests
for env propagation and bot-token secrecy. Full suite green; coverage 93%.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Re-apply the #121 draft (PR #143) onto current develop, resolving 103
commits of context drift. New modules land unchanged; conflicts were all
additive (healthchecks #79, clearnet #234, update-badge #224 rows sitting
beside the new telegram rows). Config renamed config.advanced.example.json
-> config.reference.json; kept the subnet-aware P2POOL_URL over the draft's
hardcoded one. Adds the telegram.commands scaffolding for the #45 half.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ged alerter

Add TelegramCommandBot: an in-process long-poll (getUpdates) loop that answers
read-only /status, /hashrate, /workers, /sync, /help from the configured chat.
Reuses build_metrics so replies match the dashboard exactly; access-gated to the
one chat_id; long-poll needs no inbound port and rides the same egress as the
alerts; fail-silent + never logs the token, same discipline as the notifier.
Wired as a third background task in main.py (no-op unless telegram.commands.enabled).

Config: telegram.commands.enabled -> TELEGRAM_COMMANDS_ENABLED (default false).
Also ruff-format + lint-fix the #121 modules the draft predated (import sort,
drop an unused import). 37 new unit tests; patch coverage 95%.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per feedback: the dashboard already computes each rig's DOWN state (status !=
online, shown in the uptime column) and keeps the row visible ~1h before it
falls off the proxy table (#182). Drive the worker-offline debounce off that
same status instead of inferring offline from absence in the online-name set —
so a Telegram 'offline' alert lines up with the rig showing DOWN on screen, and
a rig that vanishes from the table entirely is forgotten (the dashboard no
longer shows it) rather than aged into a false offline.

WorkerPresenceMonitor.update now takes the worker rows (name+status); offline
fires while DOWN for offline_after, recovered after recovery_after online,
forgotten when the proxy stops listing it. Drops the redundant retention timer
(WORKER_RETENTION_SEC now removed — it mirrored the lifecycle's own falloff).

worker_presence.py 100% covered; make test green; patch coverage 94%.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… /xvb commands

Alerts (all reuse what the dashboard already computes, per feedback):
- worker joined/left — woven into WorkerPresenceMonitor via a prime flag so a
  restart/failover-readmit doesn't replay the roster as joins; 'left' fires when
  a rig drops off the proxy table entirely (vs 'offline' = DOWN-but-still-listed).
- disk filling/critical — crosses the same DISK_WARN/CRITICAL_PERCENT thresholds
  as the dashboard's low-disk badge (#138); a full disk corrupts monerod's DB.
- DB write failing — StateManager.is_db_healthy flipping false (#131).
Disk usage is read once in the loop and reused in the snapshot.

Commands (read-only, reuse build_metrics + the system snapshot):
- /system (disk/RAM/CPU/load/HugePages), /pool (sidechain + Monero network),
  /xvb (mode/tier/routed/raffle-eligibility).

Config: telegram.events gains worker_joined/worker_left/disk_space/db_unhealthy
(all default true); plumbed through pithead render, compose, config.reference.json.
make test green; patch coverage 97%.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…mand

Reviewed the dashboard's own noteworthy-state catalog (build_badges) and added the
high-value, cheap-to-reuse signals:
- xvb_no_share — donating to XvB with no PPLNS share in the window means raffle wins
  are skipped (#158); revenue make-or-break. Gated on XvB enabled.
- clearnet_exposed — a node syncing over clearnet exposes the host IP (#183); privacy
  signal on a Tor-first stack. Reverts to Tor automatically (#234).
Both computed in the data loop from existing figures (shares_in_pplns_window,
clearnet_sync_state) and passed as scalars, keeping evaluate() pure. Events default true.

/earnings command — estimated P2Pool XMR/day, reusing service/earnings.xmr_per_hs_day
applied to the displayed P2Pool 1h hashrate (no web-layer import).

Larger follow-ups raised as separate issues (block/payout-found, container crash-loop,
two-way control, XvB-registration alert, etc). make test green; patch coverage 97%.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…i, egress diagram

- Tor routing (#340): notifier + command bot now dial api.telegram.org through the
  bridge Tor SOCKS proxy (socks5h), same as Healthchecks/XvB — never leaks the host IP.
  Command bot swapped from aiohttp to requests-in-a-thread to reuse the SOCKS path (no
  new dep); getUpdates long-poll runs via asyncio.to_thread.
- New alerts (folding in the achievable parts of #339): 'Pithead online' one-shot
  heartbeat on start, XvB registration rejected/failing/recovered, new-release-available.
- Emoji enrichment across all alert messages (⛏️ workers, ⛓️ nodes, 💾 disk, 🗄️ db,
  🎰 XvB, 🧅/🌐 clearnet, 🚀 online).
- Network diagram (docs/architecture.md): now shows each dashboard egress path
  (Telegram, Healthchecks, XvB stats, GitHub) tagged with its route — all 🟢 Tor.
- config.json telegram.events gains xvb_registration/new_release/stack_online (default
  true); plumbed pithead->compose->config.py; docs updated.

Kept as issues (genuinely need new infra): #336 block/payout (no per-node/payout signal),
#337 crash-loop (read proxy has no inspect/health), #338 two-way control (auth model).

make test green; patch coverage 96%.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
VijitSingh97 and others added 2 commits July 2, 2026 19:28
… gaps

Testing to standard (docs/testing-strategy.md — test each behaviour once, at the
lowest honest tier):
- tier-1 pytest: assert AlertService.EVT_* == config.py TELEGRAM_EVENTS (adding an
  alert but forgetting its toggle, or vice versa, now fails a test).
- tier-1 shell (run.sh): assert every telegram.events key in config.reference.json
  renders into .env AND is declared in docker-compose.yml — guards the config surface
  the pytest can't see (14 events × 2, +28 assertions).
- Fill real branches: process() swallowing an evaluate() error (the never-break-the-
  loop guard), reply_for /hashrate + /sync, _safe_reply_for's error path, the benign
  XvB-registration transition. alert_service + telegram_commands now 99%.

Docs reviewed against docs/STYLE.md (voice + code-accuracy) — no changes needed.
make test green; patch coverage 98%; test-inventory regenerated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Daily summary: a once-a-day status roll-up (nodes/mining/workers/hashrate/shares/
  disk) pushed at a configurable local time. telegram.daily_summary_time (default
  08:00) + a daily_summary event toggle (default on). Fires once/day at the target;
  a post-time restart waits for the next day rather than replaying; malformed time
  disables it. Uses the dashboard container's timezone. Built lazily (only when due)
  from the same build_metrics the dashboard renders.
- Config plumbed config.json -> pithead render -> compose -> config.py; describe_change
  + the run.sh event-consistency loop + a time-propagation test cover the surfaces.
- Wiring test (tier 1): asserts DataService.run() hands AlertService the full signal
  contract each cycle + calls maybe_daily_summary — closes the one automatable e2e gap
  (the alert LOGIC was fully unit-tested; the loop->alerter wiring wasn't asserted).

make test green; patch coverage 97%; docs (telegram.md/configuration.md/CHANGELOG) updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
#339)

- /hashrate consistency: total and per-worker now use one shared effective_hashrate()
  (10m avg, 1m fallback for a just-connected rig without 10m history) — so per-worker
  lines sum to the total and a fresh worker shows its real rate, not 0.00. Fixed in
  /hashrate, /workers, and the daily digest label; _aggregate_hashrate reuses the helper.
- Tor network panel (egress #170): the Telegram bot now appears as a dashboard egress
  path (Tor when enabled, else inactive) in both the egress list and the topology graph.
- hashrate_low alert (#339 remainder): edge alert when a fixed XvB tier can't be sustained
  / recovers, from metrics.low_hr_warning (built once per cycle, only when the bot is on).
  #340 (Tor routing) was already complete.

make test green; patch coverage 98%; docs + CHANGELOG + roadmap #333 updated; 6 issues
(#99/#104/#59/#84/#118/#116) got Telegram acceptance-criteria bullets.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant