Telegram operator bot: alerts + on-demand commands, over Tor (#121, #45) by VijitSingh97 · Pull Request #341 · p2pool-starter-stack/pithead

VijitSingh97 · 2026-07-03T00:18:40Z

Closes #121, closes #45. Supersedes the stale draft #143 (its three modules were salvaged and rebased here).

A Telegram operator bot for the stack: push alerts for anything worth knowing, plus read-only status commands on demand. Off by default; everything routes over Tor.

Alerts (debounced, one message per real transition)

🚀 Pithead online · ⛓️ node down/recovered · ⛏️ worker offline/back · 🎉 worker joined · 👋 worker left · ✅ sync finished · 💾 disk filling/critical · 🗄️ DB write failing · 🎰 no PPLNS share (XvB wins skipped) · 🎰 XvB registration rejected/failing · 🌐 node exposed on clearnet · 🆕 new release available

Each reuses a signal the dashboard already computes (the build_badges catalog / Metrics / NodeHealthMonitor) — no re-collection. Per-event telegram.events.* toggles, all default on.

Commands (read-only; only the configured `chat_id` is answered)

/status · /hashrate · /workers · /sync · /system · /pool · /xvb · /earnings · /help

Long-poll (getUpdates), so no inbound port; replies come from the same build_metrics the web UI renders, so they always match the dashboard.

Design

Over Tor. Notifier + command bot dial api.telegram.org through the bridge Tor SOCKS proxy (socks5h), like Healthchecks/XvB — never leaks the host IP.
Read-only. No command changes the stack (lifecycle stays on the CLI), so a leaked chat can at worst read status.
Worker offline = the DOWN status the dashboard shows (not list-absence); joined/left track fleet membership. All edges seed silently so a restart never replays.
Secret handling. bot_token lives in the owner-only .env, masked in the apply preview, never logged.
docs/architecture.md diagram now shows every dashboard egress path (Telegram/Healthchecks/XvB/GitHub), all 🟢 Tor.

Testing

make test green — dashboard 769, stack 452, selftest 101, fakes 12. Patch coverage 96%, full lint clean, test-inventory regenerated. Validated live on gouda: rebuilt via upgrade, bot polls over Tor, getMe/sendMessage succeed over Tor, stack healthy.

Follow-ups (kept as issues — need new infra)

#336 block/payout alerts · #337 container crash-loop alerts · #338 two-way control commands · #339 hashrate-low-for-tier alert.

🤖 Generated with Claude Code

Ship a thin, notifications-only Telegram pusher for v1.0: node down/recovered, worker offline/back online, and sync finished. Off by default; no interactive bot (that stays in #45). Consumes signals the data loop already computes rather than re-collecting: - node down/recovered from NodeHealthMonitor's debounced `down` edge (#31) - sync finished from the sync-gate `miner_released` latch (#35) - worker offline/back via a new flap-protected per-worker presence tracker New dashboard modules: - telegram_notifier.py: thin sendMessage transport; enabled only with token + chat_id; fail-silent on offline/Tor-only hosts; never logs the bot token. - worker_presence.py: WorkerPresenceMonitor, the per-worker analogue of NodeHealthMonitor (debounce + recovery hysteresis + silent baseline + reset when the proxy is intentionally stopped). - alert_service.py: folds per-cycle signals into debounced alerts; pure evaluate() + off-thread process(); wired into data_service.run(). Plumbing: config.json telegram.* -> pithead render_env -> per-event env vars -> config.py. bot_token rendered to the owner-only .env and masked in the apply preview. Injected into the dashboard container in docker-compose; added to the advanced example config. Docs: new docs/telegram.md setup guide (BotFather, chat id, per-event toggles, "one chat, two bots" with #79, Tor-only caveat, troubleshooting); cross-refs in the docs index, configuration reference, and CHANGELOG. Tests: 49 new pytest cases (notifier/monitor/alert service), plus stack tests for env propagation and bot-token secrecy. Full suite green; coverage 93%. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

# Conflicts: # CHANGELOG.md

Re-apply the #121 draft (PR #143) onto current develop, resolving 103 commits of context drift. New modules land unchanged; conflicts were all additive (healthchecks #79, clearnet #234, update-badge #224 rows sitting beside the new telegram rows). Config renamed config.advanced.example.json -> config.reference.json; kept the subnet-aware P2POOL_URL over the draft's hardcoded one. Adds the telegram.commands scaffolding for the #45 half. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ged alerter Add TelegramCommandBot: an in-process long-poll (getUpdates) loop that answers read-only /status, /hashrate, /workers, /sync, /help from the configured chat. Reuses build_metrics so replies match the dashboard exactly; access-gated to the one chat_id; long-poll needs no inbound port and rides the same egress as the alerts; fail-silent + never logs the token, same discipline as the notifier. Wired as a third background task in main.py (no-op unless telegram.commands.enabled). Config: telegram.commands.enabled -> TELEGRAM_COMMANDS_ENABLED (default false). Also ruff-format + lint-fix the #121 modules the draft predated (import sort, drop an unused import). 37 new unit tests; patch coverage 95%. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Per feedback: the dashboard already computes each rig's DOWN state (status != online, shown in the uptime column) and keeps the row visible ~1h before it falls off the proxy table (#182). Drive the worker-offline debounce off that same status instead of inferring offline from absence in the online-name set — so a Telegram 'offline' alert lines up with the rig showing DOWN on screen, and a rig that vanishes from the table entirely is forgotten (the dashboard no longer shows it) rather than aged into a false offline. WorkerPresenceMonitor.update now takes the worker rows (name+status); offline fires while DOWN for offline_after, recovered after recovery_after online, forgotten when the proxy stops listing it. Drops the redundant retention timer (WORKER_RETENTION_SEC now removed — it mirrored the lifecycle's own falloff). worker_presence.py 100% covered; make test green; patch coverage 94%. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… /xvb commands Alerts (all reuse what the dashboard already computes, per feedback): - worker joined/left — woven into WorkerPresenceMonitor via a prime flag so a restart/failover-readmit doesn't replay the roster as joins; 'left' fires when a rig drops off the proxy table entirely (vs 'offline' = DOWN-but-still-listed). - disk filling/critical — crosses the same DISK_WARN/CRITICAL_PERCENT thresholds as the dashboard's low-disk badge (#138); a full disk corrupts monerod's DB. - DB write failing — StateManager.is_db_healthy flipping false (#131). Disk usage is read once in the loop and reused in the snapshot. Commands (read-only, reuse build_metrics + the system snapshot): - /system (disk/RAM/CPU/load/HugePages), /pool (sidechain + Monero network), /xvb (mode/tier/routed/raffle-eligibility). Config: telegram.events gains worker_joined/worker_left/disk_space/db_unhealthy (all default true); plumbed through pithead render, compose, config.reference.json. make test green; patch coverage 97%. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…mand Reviewed the dashboard's own noteworthy-state catalog (build_badges) and added the high-value, cheap-to-reuse signals: - xvb_no_share — donating to XvB with no PPLNS share in the window means raffle wins are skipped (#158); revenue make-or-break. Gated on XvB enabled. - clearnet_exposed — a node syncing over clearnet exposes the host IP (#183); privacy signal on a Tor-first stack. Reverts to Tor automatically (#234). Both computed in the data loop from existing figures (shares_in_pplns_window, clearnet_sync_state) and passed as scalars, keeping evaluate() pure. Events default true. /earnings command — estimated P2Pool XMR/day, reusing service/earnings.xmr_per_hs_day applied to the displayed P2Pool 1h hashrate (no web-layer import). Larger follow-ups raised as separate issues (block/payout-found, container crash-loop, two-way control, XvB-registration alert, etc). make test green; patch coverage 97%. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…i, egress diagram - Tor routing (#340): notifier + command bot now dial api.telegram.org through the bridge Tor SOCKS proxy (socks5h), same as Healthchecks/XvB — never leaks the host IP. Command bot swapped from aiohttp to requests-in-a-thread to reuse the SOCKS path (no new dep); getUpdates long-poll runs via asyncio.to_thread. - New alerts (folding in the achievable parts of #339): 'Pithead online' one-shot heartbeat on start, XvB registration rejected/failing/recovered, new-release-available. - Emoji enrichment across all alert messages (⛏️ workers, ⛓️ nodes, 💾 disk, 🗄️ db, 🎰 XvB, 🧅/🌐 clearnet, 🚀 online). - Network diagram (docs/architecture.md): now shows each dashboard egress path (Telegram, Healthchecks, XvB stats, GitHub) tagged with its route — all 🟢 Tor. - config.json telegram.events gains xvb_registration/new_release/stack_online (default true); plumbed pithead->compose->config.py; docs updated. Kept as issues (genuinely need new infra): #336 block/payout (no per-node/payout signal), #337 crash-loop (read proxy has no inspect/health), #338 two-way control (auth model). make test green; patch coverage 96%. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… gaps Testing to standard (docs/testing-strategy.md — test each behaviour once, at the lowest honest tier): - tier-1 pytest: assert AlertService.EVT_* == config.py TELEGRAM_EVENTS (adding an alert but forgetting its toggle, or vice versa, now fails a test). - tier-1 shell (run.sh): assert every telegram.events key in config.reference.json renders into .env AND is declared in docker-compose.yml — guards the config surface the pytest can't see (14 events × 2, +28 assertions). - Fill real branches: process() swallowing an evaluate() error (the never-break-the- loop guard), reply_for /hashrate + /sync, _safe_reply_for's error path, the benign XvB-registration transition. alert_service + telegram_commands now 99%. Docs reviewed against docs/STYLE.md (voice + code-accuracy) — no changes needed. make test green; patch coverage 98%; test-inventory regenerated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- Daily summary: a once-a-day status roll-up (nodes/mining/workers/hashrate/shares/ disk) pushed at a configurable local time. telegram.daily_summary_time (default 08:00) + a daily_summary event toggle (default on). Fires once/day at the target; a post-time restart waits for the next day rather than replaying; malformed time disables it. Uses the dashboard container's timezone. Built lazily (only when due) from the same build_metrics the dashboard renders. - Config plumbed config.json -> pithead render -> compose -> config.py; describe_change + the run.sh event-consistency loop + a time-propagation test cover the surfaces. - Wiring test (tier 1): asserts DataService.run() hands AlertService the full signal contract each cycle + calls maybe_daily_summary — closes the one automatable e2e gap (the alert LOGIC was fully unit-tested; the loop->alerter wiring wasn't asserted). make test green; patch coverage 97%; docs (telegram.md/configuration.md/CHANGELOG) updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

#339) - /hashrate consistency: total and per-worker now use one shared effective_hashrate() (10m avg, 1m fallback for a just-connected rig without 10m history) — so per-worker lines sum to the total and a fresh worker shows its real rate, not 0.00. Fixed in /hashrate, /workers, and the daily digest label; _aggregate_hashrate reuses the helper. - Tor network panel (egress #170): the Telegram bot now appears as a dashboard egress path (Tor when enabled, else inactive) in both the egress list and the topology graph. - hashrate_low alert (#339 remainder): edge alert when a fixed XvB tier can't be sustained / recovers, from metrics.low_hr_warning (built once per cycle, only when the bot is on). #340 (Tor routing) was already complete. make test green; patch coverage 98%; docs + CHANGELOG + roadmap #333 updated; 6 issues (#99/#104/#59/#84/#118/#116) got Telegram acceptance-criteria bullets. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

VijitSingh97 and others added 8 commits June 4, 2026 02:17

Merge remote-tracking branch 'origin/main' into pr-143

a043402

# Conflicts: # CHANGELOG.md

VijitSingh97 mentioned this pull request Jul 3, 2026

Telegram alerting (notifications-only): node/worker down + recovered (#121) #143

Closed

8 tasks

VijitSingh97 and others added 2 commits July 2, 2026 19:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Telegram operator bot: alerts + on-demand commands, over Tor (#121, #45)#341

Telegram operator bot: alerts + on-demand commands, over Tor (#121, #45)#341
VijitSingh97 wants to merge 11 commits into
developfrom
claude/elegant-ardinghelli-3fa5fd

VijitSingh97 commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

VijitSingh97 commented Jul 3, 2026

Alerts (debounced, one message per real transition)

Commands (read-only; only the configured chat_id is answered)

Design

Testing

Follow-ups (kept as issues — need new infra)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Commands (read-only; only the configured `chat_id` is answered)