tools: Add traffic_grapher for real-time ATS metrics visualization#12848
tools: Add traffic_grapher for real-time ATS metrics visualization#12848bryancall wants to merge 15 commits intoapache:masterfrom
Conversation
A Python tool that displays ATS metrics inline in iTerm2 using imgcat
with live updates and multi-host comparison.
Features:
- Real-time graphs of RPS, latency, cache hit rate, connections
- Support for 1-4 hosts with different line styles for comparison
- Collects metrics via JSONRPC Unix socket (batch collection)
- Dark theme optimized for terminal display
- Keyboard navigation between metric pages (4 pages)
- Configurable refresh interval and history window
Requirements:
- Python 3 with matplotlib
- iTerm2 (or compatible terminal for inline images)
- SSH access to remote ATS hosts
Usage:
traffic_grapher.py ats-server1.example.com
traffic_grapher.py ats{1..4}.example.com --interval 2
There was a problem hiding this comment.
Pull request overview
A Python-based traffic monitoring tool that visualizes real-time ATS (Apache Traffic Server) metrics using inline terminal graphics. The tool collects metrics via JSONRPC Unix socket and renders them as time-series graphs directly in iTerm2, supporting multi-host comparison with keyboard-driven navigation.
Changes:
- New traffic_grapher.py script with real-time metrics visualization
- Support for 1-4 hosts with distinct line styles for comparison
- Four pre-configured dashboard pages covering traffic, cache, TLS/HTTP2, and network metrics
- Dark theme optimized for terminal display with configurable refresh and history
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cmcfarlen
left a comment
There was a problem hiding this comment.
This doesn't run. Please create a uv project or some way to run this.
% python3 tools/traffic_grapher/traffic_grapher.py
Traceback (most recent call last):
File "/Users/cmcfarlen/projects/oss/trafficserver/tools/traffic_grapher/traffic_grapher.py", line 55, in <module>
import matplotlib
ModuleNotFoundError: No module named 'matplotlib'
Address review feedback: - Add pyproject.toml and PEP 723 inline script metadata so the tool can be run via 'uv run traffic_grapher.py' without manual pip installs - Make paths configurable: --traffic-ctl, --socket-path CLI args with TRAFFIC_CTL_PATH and TRAFFICSERVER_JSONRPC_SOCKET env var fallbacks (replaces hard-coded /opt/edge/... paths) - Fix command injection: use subprocess.run with list args instead of shell=True, add hostname validation via regex - Replace bare except clauses with specific exception types - Remove unused imports (select, tty) - Parse traffic_ctl output in Python instead of piping through awk
… --traffic-ctl - Add README.md with usage, quick start, configuration, and CLI reference - Rename --socket-path to --socket for brevity - Remove unused --traffic-ctl CLI option (main code uses JSONRPC socket, not traffic_ctl)
|
I can run this now with the uv project, but it doesn't seem to work for |
- Detect localhost/127.0.0.1/local and connect directly to the JSONRPC Unix socket instead of SSH, so 'traffic_grapher.py localhost' works without any SSH setup - Add startup connection test that reports success/failure for each host before entering the graph loop - Track and display collection errors: show in red on the dashboard status bar and print to stderr for the first few failures - Give clear error messages for common failures: socket not found, permission denied, connection refused, SSH failures, timeouts, empty responses Addresses review feedback from cmcfarlen.
Use raw string prefix for the regex pattern in the JSONRPC script template so backslash-dot escape sequences do not trigger SyntaxWarning/SyntaxError on remote hosts running Python 3.12+.
Instead of requiring the user to know where the JSONRPC socket is, auto-discover it by finding traffic_ctl (via PATH or common install prefixes like /usr/local, /opt/ats, /opt/trafficserver) and querying the runtime directory. Falls back to checking common socket locations. Discovery runs once at startup per host. Users can still override with --socket or TRAFFICSERVER_JSONRPC_SOCKET env var.
In GUI mode, start with a large default size (16x10 inches) and maximize the window on startup. On subsequent frames, don't override the figure size so user resizes are preserved. The terminal-based size calculation is only used for imgcat mode.
Running 'traffic_grapher.py' with no arguments now monitors the local ATS instance instead of requiring 'localhost' to be passed explicitly.
|
Fixed in the latest push. Here's what changed: Localhost support:
Blank screen / no errors:
Socket path auto-discovery:
GUI mode:
|
The 16x10 default size is reasonable and the user can resize freely.
Hook into matplotlib key_press_event so h/l and arrow keys switch pages in GUI mode. Hide the default navigation toolbar since pan/zoom controls are not useful for a live dashboard.
Set rcParams toolbar to None before figure creation instead of trying to remove it after.
Set toolbar to None before pyplot import so the matplotlib window does not show the default navigation toolbar.
In GUI mode, skip all startup prints and stderr collection errors since the dashboard already shows errors visually. This allows running in the background without output spilling into the terminal.
|
Also cleaned up the GUI mode (
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
You can also share your feedback on Copilot code review. Take the survey.
| response = sock.recv(1048576).decode() | ||
| sock.close() | ||
|
|
||
| # Parse and output simple key=value format | ||
| data = json.loads(response) |
There was a problem hiding this comment.
The embedded remote JSONRPC helper reads from the Unix socket with a single recv(). recv() is not message-framed, so this can return partial JSON even for small responses, causing json.loads() to fail and yielding empty/invalid metric output. Update the script to read until the full JSON-RPC response is received (e.g., loop until EOF/newline or until json parsing succeeds, similar to _collect_local()).
| response = sock.recv(1048576).decode() | |
| sock.close() | |
| # Parse and output simple key=value format | |
| data = json.loads(response) | |
| # Read until we have a complete JSON response or the socket is closed | |
| buf = b"" | |
| data = None | |
| while True: | |
| chunk = sock.recv(4096) | |
| if not chunk: | |
| break | |
| buf += chunk | |
| try: | |
| data = json.loads(buf.decode()) | |
| break | |
| except json.JSONDecodeError: | |
| # Response not complete yet; continue reading | |
| continue | |
| sock.close() | |
| if data is None: | |
| raise SystemExit(1) | |
| # Parse and output simple key=value format |
|
|
||
| # In no-keyboard mode, just save the PNG, don't try imgcat | ||
| if self.save_png: | ||
| print(f"Iteration {self.iteration}: saved to {self.save_png}") |
There was a problem hiding this comment.
In --no-keyboard mode, the status line prints self.save_png, but render_page() may replace '{iter}' with the current iteration when saving. This can misreport the actual file written. Consider printing the resolved path (the same value used in render_page()) or returning it from render_page().
| print(f"Iteration {self.iteration}: saved to {self.save_png}") | |
| resolved_path = self.save_png | |
| if "{iter}" in resolved_path: | |
| try: | |
| resolved_path = resolved_path.format(iter=self.iteration) | |
| except Exception: | |
| # If formatting fails for any reason, fall back to the template. | |
| pass | |
| print(f"Iteration {self.iteration}: saved to {resolved_path}") |
|
|
||
| | Option | Env Var | Default | | ||
| |--------|---------|---------| | ||
| | `--socket` | `TRAFFICSERVER_JSONRPC_SOCKET` | `/usr/local/var/trafficserver/jsonrpc20.sock` | |
There was a problem hiding this comment.
README claims the default JSONRPC socket is "/usr/local/var/trafficserver/jsonrpc20.sock", but the code defaults --socket to TRAFFICSERVER_JSONRPC_SOCKET and otherwise uses auto-discovery (DEFAULT_JSONRPC_SOCKET_PATH is None when env var is unset). Please update this table to reflect the actual default/behavior so users aren't pointed at a path that may not be used.
| | `--socket` | `TRAFFICSERVER_JSONRPC_SOCKET` | `/usr/local/var/trafficserver/jsonrpc20.sock` | | |
| | `--socket` | `TRAFFICSERVER_JSONRPC_SOCKET` | Value of `TRAFFICSERVER_JSONRPC_SOCKET`, otherwise auto-discover (no fixed path) | |
| [project] | ||
| name = "traffic-grapher" | ||
| version = "1.0.0" | ||
| description = "Real-time ATS metrics visualization for iTerm2" | ||
| requires-python = ">=3.9" | ||
| license = "Apache-2.0" | ||
| dependencies = [ | ||
| "matplotlib>=3.7", | ||
| "pyyaml>=6.0", | ||
| ] | ||
|
|
||
| [project.scripts] | ||
| traffic-grapher = "traffic_grapher:main" |
There was a problem hiding this comment.
This pyproject.toml defines a console script entry point, but it lacks a [build-system] section and any packaging configuration to include the traffic_grapher module. As written, pip install . / uv sync is likely to fail or produce an install without the traffic_grapher module, making the traffic-grapher entry point unusable. Add a build backend (e.g., setuptools) and configure packaging (package directory or py_modules) so the entry point can import main reliably.
| # Override history from config if specified | ||
| history = args.history | ||
| if 'history' in config: | ||
| history = config['history'].get('seconds', args.history) |
There was a problem hiding this comment.
Config parsing assumes config['history'] is a mapping and calls .get('seconds', ...). If a user supplies a scalar (e.g., history: 120) this will raise an AttributeError at startup. Consider accepting both forms (int seconds or {seconds: ...}) or validating the type and emitting a clear parser.error().
| history = config['history'].get('seconds', args.history) | |
| history_cfg = config['history'] | |
| if isinstance(history_cfg, dict): | |
| history = history_cfg.get('seconds', args.history) | |
| elif isinstance(history_cfg, (int, float)): | |
| history = int(history_cfg) | |
| else: | |
| parser.error("Invalid 'history' in config: expected mapping or number of seconds") |
Summary
A Python tool that displays ATS metrics inline in iTerm2 using imgcat with live updates and multi-host comparison.
Features:
Requirements:
Usage:
traffic_grapher.py ats-server1.example.com traffic_grapher.py ats{1..4}.example.com --interval 2Test plan