Internet/LAN
│
├── :9090 → Controller (dashboard, discovery, management)
├── :8080 → Traefik ─┬─ / → Nautobot (inventory + API)
│ ├─ /grafana → Grafana (dashboards)
│ └─ /prometheus → Prometheus (metrics)
├── :8443 → Nautobot (direct fallback)
└── :3000 → Grafana (direct fallback)
Internal (mnm-network):
PostgreSQL 15 ← Nautobot database
Redis 7 ← Nautobot cache + Celery broker
Celery Worker ← Runs onboarding/sync jobs
Celery Scheduler ← Periodic tasks
SNMP Exporter ← Prometheus scrapes → polls devices via SNMP
gnmic ← gNMI streaming telemetry → Prometheus
Prometheus ← Metric storage, 365-day retention
Grafana ← Visualizes Prometheus data
Controller ← Management UI, discovery engine
MNM runs as 11 Docker containers on a shared mnm-network bridge.
| Container | Image | Purpose | Port |
|---|---|---|---|
| mnm-controller | mnm-controller:latest | Management UI, discovery engine | :9090 |
| mnm-traefik | traefik:v2.11 | Reverse proxy, Docker label routing | :8080 |
| mnm-nautobot | mnm-nautobot:latest | Network inventory, REST API | :8443 |
| mnm-nautobot-worker | mnm-nautobot:latest | Celery task worker (onboarding jobs) | — |
| mnm-nautobot-scheduler | mnm-nautobot:latest | Celery beat scheduler | — |
| mnm-postgres | postgres:15 | Nautobot database | — |
| mnm-redis | redis:7-alpine | Cache + Celery message broker | — |
| mnm-prometheus | prom/prometheus:v3.4.0 | Metric storage (365-day retention) | via Traefik |
| mnm-grafana | grafana/grafana-oss:12.0.1 | Dashboard visualization | :3000 |
| mnm-snmp-exporter | prom/snmp-exporter:v0.29.0 | SNMP polling for Prometheus | — |
| mnm-gnmic | ghcr.io/openconfig/gnmic:latest | gNMI streaming telemetry | — |
| Port | Service | Access |
|---|---|---|
| 9090 | Controller | Direct — management UI and API |
| 8080 | Traefik | Proxied — routes to Nautobot (/), Grafana (/grafana), Prometheus (/prometheus) |
| 8443 | Nautobot | Direct fallback |
| 3000 | Grafana | Direct fallback |
| Data | Location | Retention | Notes |
|---|---|---|---|
| SNMP metrics | Prometheus | 365 days (configurable via PROMETHEUS_RETENTION_DAYS, 10 GB cap) |
|
| Discovery records | Nautobot IPAM | Indefinite | Never deleted, first_seen/last_seen tracked |
| Device inventory | Nautobot | Indefinite | Devices, interfaces, cables, LLDP |
| Grafana dashboards | Grafana volume | Indefinite |
MNM collects sensitive network intelligence. If compromised, this data could be used to map and attack the monitored network. The following documents MNM's security posture, trust boundaries, and hardening recommendations.
- Controller UI: Password-gated via
MNM_ADMIN_PASSWORD. All management operations (discovery, sweep configuration, container control) require authentication. - Nautobot: Has its own authentication system using
MNM_ADMIN_USER/MNM_ADMIN_PASSWORD. API access requires a token, obtained by the controller at startup viadocker execinto the Nautobot container. - Grafana: Anonymous read-only access is enabled by default to support the appliance model. In sensitive environments, disable anonymous access by setting
GF_AUTH_ANONYMOUS_ENABLED=falsein docker-compose.yml and using Grafana's built-in auth. - Internal services: PostgreSQL, Redis, Prometheus, SNMP exporter, and gnmic are not exposed to the host. They communicate only within the
mnm-networkDocker bridge and are unreachable from outside the host. - Docker socket mount: The controller mounts
/var/run/docker.sockfor container management (health checks, restart, bootstrap). This grants root-equivalent access to the host. The controller container is the trust boundary — anyone who can execute code inside it effectively has root on the host. Future mitigation: consider docker-socket-proxy to restrict the API surface.
- PostgreSQL stores all Nautobot data including device inventories, IP addresses, topology, and Secrets (SSH/NETCONF credentials). The
mnm-postgres-datavolume should reside on encrypted storage. - Prometheus stores all collected metrics (SNMP, gNMI, discovery telemetry). The
mnm-prometheus-datavolume should reside on encrypted storage. - Controller config (
config.json) stores sweep configuration including IP ranges and scheduling. Themnm-controller-datavolume should reside on encrypted storage. - Recommendation: Deploy MNM on a host or VM with full-disk encryption enabled. This protects all volumes, logs, and temporary files at rest.
- Snapshots: Proxmox or hypervisor snapshots containing MNM data include credentials and network intelligence. Treat all snapshots and backups as sensitive.
- Internal container traffic: Communication between containers on the
mnm-networkDocker bridge is unencrypted. This is acceptable for same-host deployments where all containers share a single machine. - Traefik (HTTP): Traefik serves HTTP on port 8080 by default. To add TLS: mount certificate files into the Traefik container, add a TLS entrypoint in
config/traefik/traefik.yml, and update router rules to use the HTTPS entrypoint. Self-signed certificates are sufficient for management access; use CA-signed certificates if exposing to a broader network. - SNMP v2c: Community strings are sent in plaintext over the network. This is an inherent protocol limitation. Use SNMPv3 with authentication and encryption where devices support it. Restrict SNMP access to the management VLAN.
- SSH/NETCONF: Connections to network devices via NAPALM use SSH (encrypted by protocol). NETCONF runs over SSH. These are secure in transit.
- Network access: All device communication is strictly read-only (Inviolable Rule 1). SNMP GET/WALK only, TCP connect-then-close for probes, SSH/NETCONF via NAPALM read-only drivers. MNM never sends SET, configuration, or write operations.
- NAPALM/SSH credentials are stored in Nautobot Secrets, backed by PostgreSQL. They are not stored as plaintext files on disk (beyond the initial
.envused for bootstrap). - SNMP community strings appear in
.envand inconfig/prometheus/prometheus.yml(or generated scrape configs). Restrict file permissions on both. .envfile: Should bechmod 600and owned by the deploying user. Contains database passwords, admin credentials, SNMP communities, and API keys. Already listed in.gitignore— never commit this file.- Grafana admin password: Set via
GRAFANA_ADMIN_PASSWORDin.env. If unset, Grafana uses its default (admin), which should be changed immediately.
- Deploy MNM on a dedicated VM with full-disk encryption (LUKS on Linux, or hypervisor-level encryption).
- Restrict network access to exposed ports (8080, 8443, 9090, 3000) to authorized management stations only.
- Use host firewall rules or network ACLs to limit who can reach the MNM UI.
- Place MNM on a dedicated management VLAN with access to device management interfaces.
- Rotate credentials (NAPALM, SNMP, admin passwords) periodically.
- Back up MNM data (PostgreSQL dumps, Prometheus snapshots) to encrypted storage.
- Treat all snapshots and backups containing MNM data as sensitive — they include credentials and full network topology.
- Review Grafana anonymous access settings before deploying in environments where dashboard data is sensitive.
- Consider upgrading from SNMP v2c to v3 where device support allows.
Key architectural choices and rationale are documented in the Design Decisions Log in the project's internal architecture document. Notable decisions:
- Nautobot over NetBox: SSoT plugin framework and device onboarding plugin
- Traefik v2.11 over v3: Docker API incompatibility in v3 (see Phase 2 Lessons Learned)
- Seed-and-sweep over recursive crawl: Prevents runaway discovery (Rule 6)
- Vanilla JS over frameworks: Minimal dependencies, appliance model
- 365-day retention: Enables historical network intelligence queries
Endpoint collection uses SNMP for speed:
- ARP table: SNMP walk of
ipNetToMediaTable(~1-2 seconds per device) - MAC table: SNMP walk of
dot1dTpFdbTable/ Q-BRIDGE-MIB (~1-2 seconds per device) - DHCP bindings: NETCONF RPC (Junos only, ~5 seconds)
- NAPALM proxy: fallback only when SNMP fails (~15-30 seconds per device)
Discovery sweep optimizations:
- Pre-fetch all known IPs from Nautobot in one bulk call (eliminates N per-host API calls)
- Enrichment steps (SNMP, DNS, banners, ARP) run in parallel per host via
asyncio.gather - IPAM writes batched after all hosts complete
- Concurrency tunable via
MNM_SWEEP_CONCURRENCYenv var
Bootstrap skips entirely on subsequent runs when device types (>5000) and custom fields are already present.
The controller persists its state to a dedicated mnm_controller database on
the same mnm-postgres instance that hosts Nautobot. No additional containers
are introduced — the database is created idempotently by bootstrap.sh.
Why a separate database, not Nautobot's? Nautobot's IPAM model is the source of truth for IP records and device inventory. The controller needs relational queries over time-series data (per-MAC event histories, sweep runs, collection runs, IP observation snapshots) that don't fit Nautobot's schema and shouldn't pollute it.
Tables:
endpoints— MAC-keyed identity, one row per unique MAC ever seen. Tracks current IP, switch, port, VLAN, hostname, vendor, classification, first/last seen, and the data source (sweep,infrastructure, orboth).endpoint_events— append-only log of changes:appeared,moved_port,moved_switch,ip_changed,hostname_changed. Each row has old/new values plus a JSONBdetailsblob.device_polls— per-device, per-job-type poll tracking. Composite PK(device_name, job_type). Trackslast_success,last_attempt,last_error,last_duration,next_due,interval_sec,enabled. Seeded from Nautobot inventory on first startup.sweep_runs— per-sweep summary (CIDR, duration, totals).collection_runs— per-collection-run summary (devices queried, endpoints found/new/updated/moved, duration).ip_observations— append-only sweep snapshot for an IP (open ports, banners, SNMP, HTTP headers, TLS data, classification).kv_config— replaces the legacyconfig.jsonfile.discovery_excludes— operator-defined exclusion list (IP or device_name).
Migration: on first startup, if endpoints is empty and a
/data/endpoints.json (or /data/config.json) file exists, the controller
imports them once and renames the source files to *.json.migrated. If
Postgres is unreachable, the controller falls back to JSON for config only —
endpoint queries return empty until the database comes back.
The ORM layer uses SQLAlchemy 2.x async with the asyncpg driver. See ../controller/app/db.py and ../controller/app/endpoint_store.py.
The controller serves a vanilla JS frontend (no frameworks) at :9090.
The UI nav bar is organized into grouped sections:
- Intelligence: Nodes, Endpoints, Events — primary data views
- Operations: Discovery, Jobs, Logs — operational tools
- Monitoring: Grafana, Prometheus, Proxmox — external service links (from dashboard)
The Dashboard is the landing page, accessible via the MNM logo.
MNM distinguishes between two categories:
- Nodes — onboarded infrastructure devices that are sources of data for MNM. They have credentials, poll schedules, and active NAPALM sessions. Listed on
/nodes. - Endpoints — everything else discovered passively through data collected from nodes and sweeps. Listed on
/endpoints.
The /endpoints page filters out MACs belonging to onboarded nodes so only passively-discovered endpoints are shown.
| Route | Page | Nav Section | Purpose |
|---|---|---|---|
/ |
Dashboard | — | Container health, node count, sweep/collection stats, polling status, advisories |
/nodes |
Nodes | Intelligence | Onboarded infrastructure with poll health, platform, role, Poll Now |
/endpoints |
Endpoints | Intelligence | Passively-discovered endpoints (excludes node MACs) |
/endpoints/{mac} |
Endpoint Detail | Intelligence | Full timeline and identity dossier for a single MAC |
/events |
Events | Intelligence | Network activity feed filtered by event type and time window |
/discover |
Discovery | Operations | Sweep configuration, CIDR ranges, schedule, results table, exclusion list |
/jobs |
Jobs | Operations | Consolidated view of all background tasks with status, schedules, and Run Now |
/logs |
Logs | Operations | Structured log viewer with level/module filtering |
The controller runs these background tasks launched at startup:
| Task | Module | Schedule | Purpose |
|---|---|---|---|
| Sweep Scheduler | discovery.py |
Per-schedule interval (default 1h) | Network discovery sweeps |
| Modular Poller | polling.py |
Per-device/per-job-type (ARP 5m, MAC 5m, DHCP 10m, LLDP 1h, Routes 1h, BGP 1h) | Independent device data collection |
| Proxmox Collector | connectors/proxmox.py |
PROXMOX_INTERVAL_SECONDS (default 5m) |
VM/container/storage inventory |
| Database Prune | main.py |
MNM_PRUNE_INTERVAL_HOURS (default 24h) |
Evict data past retention window |
The legacy monolithic Endpoint Collector (endpoint_collector.py) is disabled by default. Set MNM_LEGACY_COLLECTOR=true to re-enable during transition.
Four in-place patches are applied in nautobot/Dockerfile to work around upstream plugin issues. Check on plugin upgrades — these may become unnecessary.
| Patch | File | Issue |
|---|---|---|
| 1. Netmiko read_timeout | command_getter.py |
Hardcoded 60s too short for large Junos configs; patched to 120s |
| 2. tcp_ping null port | command_getter.py |
NautobotORMInventory never sets Host.port; patched to default 22 |
| 3. Missing serial KeyError | diffsync_utils.py |
Devices without serial crash sync job; patched via patches/patch_diffsync_utils.py |
| 4. Schema validation logging | processor.py |
Upstream gates ValidationError at DEBUG; patched to WARNING via patches/patch_processor_schema_logging.py |