Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
163ebf5
Add notifications-only Telegram alerter (#121)
VijitSingh97 Jun 4, 2026
a043402
Merge remote-tracking branch 'origin/main' into pr-143
VijitSingh97 Jun 4, 2026
2bf4b59
merge(#121): salvage Telegram notifications-only alerter onto develop
VijitSingh97 Jul 2, 2026
d5cf71f
feat(#45): on-demand Telegram status commands + ruff/format the salva…
VijitSingh97 Jul 2, 2026
b940ee6
refactor(#121): base worker-offline on the DOWN status, not list-absence
VijitSingh97 Jul 2, 2026
9ecd188
feat(#121/#45): worker join/leave + disk/DB alerts, and /system /pool…
VijitSingh97 Jul 2, 2026
fbade85
feat(#121/#45): XvB no-share + clearnet-exposed alerts, /earnings com…
VijitSingh97 Jul 2, 2026
ebc691d
feat(telegram): route over Tor, 'online' heartbeat, more alerts, emoj…
VijitSingh97 Jul 2, 2026
a956277
test(#121/#45): guard the event-set surfaces + fill Telegram coverage…
VijitSingh97 Jul 3, 2026
6f2cd46
feat(telegram): daily status digest + alert-wiring test
VijitSingh97 Jul 3, 2026
502f721
fix(telegram): consistent /hashrate, egress panel + hashrate-low aler…
VijitSingh97 Jul 3, 2026
dd05715
feat(telegram): daily summary as a 24h fleet retrospective
VijitSingh97 Jul 3, 2026
8b43bd9
docs(telegram): step-by-step for adding HealthchecksBot to the shared…
VijitSingh97 Jul 3, 2026
5d628ec
feat(telegram): daily incident log (#342)
VijitSingh97 Jul 3, 2026
233c8aa
feat(#99): hashrate-drop detector — chart markers + Telegram alert
VijitSingh97 Jul 3, 2026
8dc3b23
feat(#104): host-perf warning badges + alerts, warnings in /status
VijitSingh97 Jul 3, 2026
38c60c4
feat(telegram): /info command — version, updates, DB mode, privacy po…
VijitSingh97 Jul 3, 2026
7a99835
feat(telegram): enrich /pool, /xvb, /status with high-value mining data
VijitSingh97 Jul 3, 2026
b88d1ae
feat(telegram): add effort to /pool and a 24h earnings estimate
VijitSingh97 Jul 3, 2026
03a536a
refactor(telegram): clearer status_warnings prefix strip
VijitSingh97 Jul 3, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,55 @@ cd pithead && cp config.json.template config.json # set your Monero + Tari pay
so paste a Tor-reachable URL (hosted `hc-ping.com`, or a self-hosted onion/public instance). Fails
silently when offline / Tor down. The URL is the on/off switch and is stored as a secret in the
owner-only `.env`. See [`docs/monitoring.md`](docs/monitoring.md) (#79).
- **Telegram operator bot — push alerts + on-demand status** (#121, #45): the dashboard can push a
high-value set of operational alerts to Telegram — a **🚀 "Pithead online"** heartbeat on start,
**node down / recovered**, **worker offline / back online**, **new worker joined / left**, **sync
finished**, **data disk filling up**, **dashboard DB write failing**, **no PPLNS share while
donating to XvB** (raffle wins skipped), **XvB registration rejected / failing**, **hashrate too
low for the chosen XvB tier**, **a node exposed on clearnet** during initial sync, and **a new
release being available** — and answer status commands on demand: **`/status`**, **`/info`**
(version + update availability, Monero DB mode, P2Pool sidechain, and Tor-only/clearnet privacy
posture), **`/hashrate`**, **`/workers`**, **`/sync`**, **`/system`**, **`/pool`**, **`/xvb`**,
**`/earnings`**, and **`/help`**. It also pushes a **📅 once-a-day retrospective** at a configurable local time
(`telegram.daily_summary_time`, default **08:00**) — the last 24h across the fleet: an incident
roll-up (what went wrong during the day, or an all-clear), 24h hashrate with the P2Pool/XvB split,
shares found in the day, an estimated daily-earnings figure, and a per-machine 24h breakdown. The Telegram bot appears in the dashboard's
**network-egress panel** (#170) as a Tor-routed path alongside Healthchecks/XvB/update-check. All
traffic is **routed over Tor** (the same bridge SOCKS as Healthchecks/XvB), so the bot never
exposes the host IP to Telegram. Off by default; enable it with a `telegram` block in `config.json` (`enabled`,
`bot_token`, `chat_id`, per-event `events` toggles, and a `commands.enabled` switch for the
interactive half). Every alert is **debounced** so a momentary blip won't ping you and you get one
message per real transition — and each is built by *reusing* what the dashboard already computes:
worker offline/joined/left keys off the same per-rig **DOWN** status the UI shows, and the disk /
DB alerts cross the same thresholds as the dashboard's own low-disk and DB-health badges. Commands
**long-poll** (`getUpdates`) so they need no inbound port and ride the same Tor egress as the
alerts, are **read-only** (they never change the stack), and only the configured `chat_id` is
answered — every other update is ignored. The `bot_token` is treated as a secret (owner-only
`.env`, never logged), and both sends and polling **fail silently** on a Tor-only / offline host.
Messages are prefixed with the dashboard hostname so multiple stacks can share one chat. Full
walkthrough — creating a bot, finding your chat id, the command list, and the "one chat, two bots"
pattern for sharing a chat with the Healthchecks.io monitor (#79) — in
[`docs/telegram.md`](docs/telegram.md).
- **Host & performance warning badges + alerts** (#104): the top bar now surfaces the persistent
host conditions `setup` warns about, derived from **live** metrics (so they self-correct): **⚠
HugePages off** (RandomX capped until reserved), **⚠ Low RAM** (under 16 GB — Tari can OOM during
sync), and **⚠ No AVX2** (slow RandomX). The first two also push a Telegram alert (`hugepages`,
`low_ram`) the first time they're seen — unlike the transient edge alerts, a stable bad state
fires on first detection, and HugePages clears with a recovery ping once a reboot applies them.
AVX2 is **badge-only** by design: a fixed hardware fact with nothing to act on at runtime doesn't
warrant a push. The bot's **`/status`** reply now ends with any active warning/error badges (the
same catalog the top bar draws) or an explicit "✅ No warnings."
- **Hashrate-drop detector — chart markers + `hashrate_loss` alert** (#99): the dashboard now flags a
**sustained, significant fall** in total fleet hashrate — a rig gone dark, a network cut, a stalled
miner — separately from the existing "too low for your XvB tier" warning. It tracks a slow moving
average as the "normal" level (frozen while degraded so an outage can't quietly redefine normal),
and fires once the total stays below **`dashboard.hashrate_drop_threshold`** percent of that
baseline for **`dashboard.hashrate_drop_minutes`** (defaults: **50%** for **10 min**), with a
matching recovery edge. Each edge drops a **diamond marker on the hashrate chart** (amber for the
drop, green for the recovery; hover for the size) that is **persisted**, so an overnight drop is
still visible in the morning, and — when Telegram is on — pushes a **`hashrate_loss`** alert and
counts toward the daily incident roll-up. Both knobs are documented in
[`docs/configuration.md`](docs/configuration.md); the alert in [`docs/telegram.md`](docs/telegram.md).
- **Optional clearnet initial sync (#183).** A default-off, per-component opt-in
(`monero.clearnet_initial_sync` / `tari.clearnet_initial_sync`) that lets a node do its **one-time
initial block download over clearnet** — much faster than over bandwidth-capped Tor circuits, which
Expand Down
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,12 @@ a Tor daemon. The `pithead` script renders config, provisions Tor, and drives do
address in the miner config; the stack routes the hashrate.
- 📊 **Live dashboard.** Hashrate, the P2Pool/XvB split, the PPLNS window, and per-worker updates,
served over HTTPS on your LAN.
- 📟 **Telegram operator bot.** Opt-in alerts for a downed node, a worker that dropped off, sync
finishing, low disk, a clearnet leak, or a sustained hashrate drop — plus a daily digest and
read-only commands (`/status`, `/hashrate`, `/workers`, `/earnings`). Routed over Tor. See the
[Telegram guide](docs/telegram.md).
- 🔔 **Dead-man's switch.** An optional [Healthchecks.io](https://healthchecks.io/) ping tells you
when the whole box goes dark — the one failure a monitor running *on* that box can never report.
- 🚀 **Interactive setup.** `pithead setup` checks dependencies, writes config, provisions Tor, and
(on Linux) tunes HugePages for RandomX. It prompts before any GRUB change, then offers to start.
- 🔒 **Hardened defaults.** Non-root containers, SHA256-verified binaries, pinned image digests,
Expand Down
20 changes: 20 additions & 0 deletions build/dashboard/mining_dashboard/collector/system.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,26 @@
BYTES_IN_GB = 1024**3

_last_cpu_times = None
_avx2_supported = None # cached: the CPU flag can't change while the process runs


def get_cpu_avx2():
"""Whether the CPU advertises AVX2 (#104). RandomX runs far slower without it, so setup warns on
it — surface the same persistent fact as a live badge. Reads /proc/cpuinfo (host CPU flags are
visible inside the container); cached, since the flag is fixed for the life of the process.
Returns True/False, or None when it can't be determined (non-Linux / unreadable)."""
global _avx2_supported
if _avx2_supported is not None:
return _avx2_supported
try:
with open("/proc/cpuinfo") as f:
for line in f:
if line.startswith("flags"):
_avx2_supported = "avx2" in line.split()
return _avx2_supported
except OSError:
pass
return None # unknown — don't cache, and callers treat None as "can't judge" (no badge/alert)


def get_disk_usage():
Expand Down
86 changes: 82 additions & 4 deletions build/dashboard/mining_dashboard/config/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@
DISK_WARN_PERCENT = 85
DISK_CRITICAL_PERCENT = 95

# Minimum host RAM (GB) below which the dashboard flags a low-RAM badge/alert (#104). Mirrors the
# setup/doctor pre-flight threshold; a code-level default (not a config.json knob).
LOW_RAM_GB = int(float(os.environ.get("LOW_RAM_GB", 16)))

# --- Data Source File Paths ---
# File paths for JSON metrics generated by local collectors
STRATUM_STATS_PATH = f"{BASE_STATS_DIR}/local/stratum"
Expand Down Expand Up @@ -219,6 +223,84 @@
# instance; a LAN-only self-hosted address is unreachable through Tor).
HEALTHCHECKS_PING_URL = os.environ.get("HEALTHCHECKS_PING_URL", "").strip()

# --- Operator alerts: Telegram (Issue #121) ---
# Notifications-only Telegram pusher: a thin notifier that pushes a small, high-value set of
# operational edges (node down/recovered, worker offline/back, sync finished) to one chat.
# Disabled by default — with TELEGRAM_ENABLED unset/false the stack runs with no Telegram
# config and never sends or errors. The interactive bot / command interface is a separate
# feature (#45); this is the notifications-only split.
#
# `bot_token` is a secret: the pithead CLI renders it into the owner-only .env (like the node
# RPC password), and the notifier never writes it to a log line. On a Tor-only / no-clearnet
# host the Telegram API is unreachable and sends fail silently (consistent with #59).
TELEGRAM_ENABLED = os.environ.get("TELEGRAM_ENABLED", "false").strip().lower() == "true"
TELEGRAM_BOT_TOKEN = os.environ.get("TELEGRAM_BOT_TOKEN", "").strip()
TELEGRAM_CHAT_ID = os.environ.get("TELEGRAM_CHAT_ID", "").strip()

# Interactive command interface (#45), separate opt-in from the alerts above. When on, the
# dashboard long-polls Telegram (getUpdates — outbound only, no inbound port) and answers
# read-only status commands from the configured chat_id. Off by default; the alerter works
# without it. See telegram_commands.py / docs/telegram.md.
TELEGRAM_COMMANDS_ENABLED = (
os.environ.get("TELEGRAM_COMMANDS_ENABLED", "false").strip().lower() == "true"
)


def _telegram_event_enabled(name, default=True):
"""Read one per-event toggle from TELEGRAM_EVENT_<NAME> (rendered from config.json's
telegram.events by pithead). Any toggle left unset defaults to on, so enabling Telegram
turns on the full set and an operator only has to opt *out* of the noisy ones."""
raw = os.environ.get(f"TELEGRAM_EVENT_{name.upper()}")
if raw is None or raw.strip() == "":
return default
return raw.strip().lower() == "true"


# Per-event delivery toggles. Keys here are the canonical event names used throughout the
# alerter (AlertService.EVT_*) and must match the config.json telegram.events block.
TELEGRAM_EVENTS = {
"node_down": _telegram_event_enabled("node_down"),
"node_recovered": _telegram_event_enabled("node_recovered"),
"worker_offline": _telegram_event_enabled("worker_offline"),
"worker_recovered": _telegram_event_enabled("worker_recovered"),
"worker_joined": _telegram_event_enabled("worker_joined"),
"worker_left": _telegram_event_enabled("worker_left"),
"sync_finished": _telegram_event_enabled("sync_finished"),
"disk_space": _telegram_event_enabled("disk_space"),
"db_unhealthy": _telegram_event_enabled("db_unhealthy"),
"xvb_no_share": _telegram_event_enabled("xvb_no_share"),
"clearnet_exposed": _telegram_event_enabled("clearnet_exposed"),
"xvb_registration": _telegram_event_enabled("xvb_registration"),
"new_release": _telegram_event_enabled("new_release"),
"stack_online": _telegram_event_enabled("stack_online"),
"daily_summary": _telegram_event_enabled("daily_summary"),
"hashrate_low": _telegram_event_enabled("hashrate_low"),
"hashrate_loss": _telegram_event_enabled("hashrate_loss"),
"hugepages": _telegram_event_enabled("hugepages"),
"low_ram": _telegram_event_enabled("low_ram"),
}
# ponytail: daily_summary is a scheduled push, not an edge — it lives in the events dict only so it
# gets a per-event on/off toggle like the rest; its time is TELEGRAM_DAILY_SUMMARY_TIME below.

# Local time (HH:MM, 24-hour) to push the once-daily status digest, when the daily_summary event is
# on. Uses the dashboard container's timezone (dashboard.timezone), so "08:00" means 8am wherever
# the box is. Rendered from config.json telegram.daily_summary_time.
TELEGRAM_DAILY_SUMMARY_TIME = os.environ.get("TELEGRAM_DAILY_SUMMARY_TIME", "08:00").strip()

# Hashrate-degradation detector (Issue #99). Flags a sustained drop in total hashrate below
# HASHRATE_DROP_THRESHOLD_PCT of its trailing baseline for HASHRATE_DROP_MINUTES minutes — surfaced
# as a chart event marker (always on) and, when telegram.events.hashrate_loss is on, an alert.
# Rendered from config.json dashboard.hashrate_drop_threshold / dashboard.hashrate_drop_minutes.
HASHRATE_DROP_THRESHOLD_PCT = int(float(os.environ.get("HASHRATE_DROP_THRESHOLD_PCT", 50)))
HASHRATE_DROP_MINUTES = int(float(os.environ.get("HASHRATE_DROP_MINUTES", 10)))

# Worker offline/online debounce (Issue #121). A worker must be unseen this long before it's
# reported OFFLINE, and seen continuously this long before "back online" — so a brief miner
# reconnect doesn't spam the chat. Workers flap more than nodes (rig reboots, Wi-Fi blips),
# so the window is wider than the node debounce above.
WORKER_OFFLINE_AFTER_SEC = int(os.environ.get("WORKER_OFFLINE_AFTER_SEC", 300))
WORKER_RECOVERY_AFTER_SEC = int(os.environ.get("WORKER_RECOVERY_AFTER_SEC", 120))

# --- Monero Configuration ---
# Used to determine if the node is local (Docker) or remote
MONERO_NODE_HOST = os.environ.get("MONERO_NODE_HOST", "172.28.0.26")
Expand Down Expand Up @@ -317,10 +399,6 @@

# --- Data Retention Policies ---
HISTORY_RETENTION_SEC = 30 * 24 * 3600 # 30 Days
# Retention for the known_workers persistence layer removed in #144. No live consumer in the current
# tree; kept for the deferred Telegram worker-presence monitor (#121), which reuses it as its
# retention default — consult that work before removing.
WORKER_RETENTION_SEC = 7 * 24 * 3600 # 7 Days
# How long an offline worker lingers in the live "Workers Alive" table before it falls off (#182).
# Operates on the live proxy-sourced list. A reconnect re-adds the worker. 1h keeps a
# just-disconnected rig visible (shown as DOWN) but clears ghosts.
Expand Down
11 changes: 11 additions & 0 deletions build/dashboard/mining_dashboard/helper/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,17 @@ def parse_hashrate(val_str, unit_str=None):
return 0.0


def effective_hashrate(worker):
"""The single figure a worker contributes to the live headline total.

Prefers the 10-minute average (the ``h15`` field — legacy name, it's the proxy's 10m rate),
falling back to the 1-minute rate (``h60`` then ``h10``) when a rig hasn't accumulated 10
minutes yet, so a freshly-connected worker reads its real live rate instead of 0. Defined once
here so the aggregate total and every per-worker display use the *same* value and can't drift.
"""
return worker.get("h15", 0) or worker.get("h60", 0) or worker.get("h10", 0) or 0


def format_hashrate(hashrate):
"""
Formats a raw hashrate value into a human-readable string with appropriate units.
Expand Down
10 changes: 9 additions & 1 deletion build/dashboard/mining_dashboard/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
from mining_dashboard.service.algo_service import AlgoService
from mining_dashboard.service.data_service import DataService
from mining_dashboard.service.storage_service import StateManager
from mining_dashboard.service.telegram_commands import TelegramCommandBot
from mining_dashboard.web.server import create_app

logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
Expand All @@ -33,17 +34,24 @@ def build_app() -> web.Application:
xvb_client = XvbClient(wallet_address=MONERO_WALLET_ADDRESS)
data_service = DataService(state_manager, proxy_client, xvb_client)
algo_service = AlgoService(state_manager, proxy_client, data_service)
# On-demand Telegram command interface (#45). Reads the snapshot data_service already collects;
# a no-op unless telegram.enabled + telegram.commands.enabled + bot_token + chat_id are set.
telegram_bot = TelegramCommandBot(data_service)

async def start_background_tasks(app):
"""Initializes background services upon web application startup."""
app["data_task"] = asyncio.create_task(data_service.run())
app["algo_task"] = asyncio.create_task(algo_service.run())
app["telegram_task"] = asyncio.create_task(telegram_bot.run())

async def cleanup_background_tasks(app):
"""Stops background tasks and closes resources on shutdown."""
app["data_task"].cancel()
app["algo_task"].cancel()
await asyncio.gather(app["data_task"], app["algo_task"], return_exceptions=True)
app["telegram_task"].cancel()
await asyncio.gather(
app["data_task"], app["algo_task"], app["telegram_task"], return_exceptions=True
)
if "state_manager" in app:
app["state_manager"].close()

Expand Down
Loading