Skip to content

tools: Add traffic_grapher for real-time ATS metrics visualization#12848

Open
bryancall wants to merge 15 commits intoapache:masterfrom
bryancall:traffic-grapher-tool
Open

tools: Add traffic_grapher for real-time ATS metrics visualization#12848
bryancall wants to merge 15 commits intoapache:masterfrom
bryancall:traffic-grapher-tool

Conversation

@bryancall
Copy link
Contributor

@bryancall bryancall commented Feb 2, 2026

Summary

A Python tool that displays ATS metrics inline in iTerm2 using imgcat with live updates and multi-host comparison.

Features:

  • Real-time graphs of RPS, latency, cache hit rate, connections
  • Support for 1-4 hosts with different line styles for comparison
  • Collects metrics via JSONRPC Unix socket (batch collection)
  • Dark theme optimized for terminal display
  • Keyboard navigation between metric pages (4 pages)
  • Configurable refresh interval and history window

Requirements:

  • Python 3 with matplotlib
  • iTerm2 (or compatible terminal for inline images)
  • SSH access to remote ATS hosts

Usage:

traffic_grapher.py ats-server1.example.com
traffic_grapher.py ats{1..4}.example.com --interval 2

Test plan

  • Tested with 1, 2, 3, and 4 hosts
  • Verified all 4 metric pages display correctly
  • Confirmed keyboard navigation (h/l, left/right arrows)
  • Tested with --save-png for offline verification

Screenshot

A Python tool that displays ATS metrics inline in iTerm2 using imgcat
with live updates and multi-host comparison.

Features:
- Real-time graphs of RPS, latency, cache hit rate, connections
- Support for 1-4 hosts with different line styles for comparison
- Collects metrics via JSONRPC Unix socket (batch collection)
- Dark theme optimized for terminal display
- Keyboard navigation between metric pages (4 pages)
- Configurable refresh interval and history window

Requirements:
- Python 3 with matplotlib
- iTerm2 (or compatible terminal for inline images)
- SSH access to remote ATS hosts

Usage:
  traffic_grapher.py ats-server1.example.com
  traffic_grapher.py ats{1..4}.example.com --interval 2
@bryancall bryancall self-assigned this Feb 2, 2026
@bryancall bryancall added this to the 10.2.0 milestone Feb 2, 2026
@bryancall bryancall added the Tools label Feb 2, 2026
@bryancall bryancall requested a review from Copilot February 2, 2026 20:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

A Python-based traffic monitoring tool that visualizes real-time ATS (Apache Traffic Server) metrics using inline terminal graphics. The tool collects metrics via JSONRPC Unix socket and renders them as time-series graphs directly in iTerm2, supporting multi-host comparison with keyboard-driven navigation.

Changes:

  • New traffic_grapher.py script with real-time metrics visualization
  • Support for 1-4 hosts with distinct line styles for comparison
  • Four pre-configured dashboard pages covering traffic, cache, TLS/HTTP2, and network metrics
  • Dark theme optimized for terminal display with configurable refresh and history

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bryancall bryancall requested a review from cmcfarlen February 2, 2026 22:37
Copy link
Contributor

@cmcfarlen cmcfarlen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't run. Please create a uv project or some way to run this.

% python3 tools/traffic_grapher/traffic_grapher.py
Traceback (most recent call last):
  File "/Users/cmcfarlen/projects/oss/trafficserver/tools/traffic_grapher/traffic_grapher.py", line 55, in <module>
    import matplotlib
ModuleNotFoundError: No module named 'matplotlib'

Address review feedback:
- Add pyproject.toml and PEP 723 inline script metadata so the tool
  can be run via 'uv run traffic_grapher.py' without manual pip installs
- Make paths configurable: --traffic-ctl, --socket-path CLI args with
  TRAFFIC_CTL_PATH and TRAFFICSERVER_JSONRPC_SOCKET env var fallbacks
  (replaces hard-coded /opt/edge/... paths)
- Fix command injection: use subprocess.run with list args instead of
  shell=True, add hostname validation via regex
- Replace bare except clauses with specific exception types
- Remove unused imports (select, tty)
- Parse traffic_ctl output in Python instead of piping through awk
… --traffic-ctl

- Add README.md with usage, quick start, configuration, and CLI reference
- Rename --socket-path to --socket for brevity
- Remove unused --traffic-ctl CLI option (main code uses JSONRPC socket,
  not traffic_ctl)
@bryancall bryancall requested a review from cmcfarlen February 10, 2026 23:03
@cmcfarlen
Copy link
Contributor

I can run this now with the uv project, but it doesn't seem to work for localhost and when I give a remote host it just has a blank screen without any messages or errors.

- Detect localhost/127.0.0.1/local and connect directly to the JSONRPC
  Unix socket instead of SSH, so 'traffic_grapher.py localhost' works
  without any SSH setup
- Add startup connection test that reports success/failure for each host
  before entering the graph loop
- Track and display collection errors: show in red on the dashboard
  status bar and print to stderr for the first few failures
- Give clear error messages for common failures: socket not found,
  permission denied, connection refused, SSH failures, timeouts, empty
  responses

Addresses review feedback from cmcfarlen.
Use raw string prefix for the regex pattern in the JSONRPC script
template so backslash-dot escape sequences do not trigger
SyntaxWarning/SyntaxError on remote hosts running Python 3.12+.
Instead of requiring the user to know where the JSONRPC socket is,
auto-discover it by finding traffic_ctl (via PATH or common install
prefixes like /usr/local, /opt/ats, /opt/trafficserver) and querying
the runtime directory. Falls back to checking common socket locations.

Discovery runs once at startup per host. Users can still override
with --socket or TRAFFICSERVER_JSONRPC_SOCKET env var.
In GUI mode, start with a large default size (16x10 inches) and
maximize the window on startup. On subsequent frames, don't override
the figure size so user resizes are preserved. The terminal-based
size calculation is only used for imgcat mode.
Running 'traffic_grapher.py' with no arguments now monitors the
local ATS instance instead of requiring 'localhost' to be passed
explicitly.
@bryancall
Copy link
Contributor Author

Fixed in the latest push. Here's what changed:

Localhost support:

  • localhost, 127.0.0.1, and local are now detected and connect directly to the JSONRPC Unix socket — no SSH.
  • Running with no arguments defaults to localhost, so traffic_grapher.py just works on the local machine.

Blank screen / no errors:

  • The regex pattern in the remote JSONRPC script used \. inside a regular string literal, which Python 3.12+ rejects as an invalid escape sequence. Fixed by using a raw string.
  • Added a startup connection test that reports pass/fail per host before entering the graph loop.
  • Collection errors are now displayed in red on the dashboard status bar and printed to stderr.

Socket path auto-discovery:

  • If --socket isn't specified, the tool now auto-discovers the JSONRPC socket by finding traffic_ctl (via PATH or common install prefixes) and querying the runtime directory.

GUI mode:

  • Window now starts maximized and preserves user resizes.

The 16x10 default size is reasonable and the user can resize freely.
Hook into matplotlib key_press_event so h/l and arrow keys switch
pages in GUI mode. Hide the default navigation toolbar since pan/zoom
controls are not useful for a live dashboard.
Set rcParams toolbar to None before figure creation instead of
trying to remove it after.
Set toolbar to None before pyplot import so the matplotlib window
does not show the default navigation toolbar.
In GUI mode, skip all startup prints and stderr collection errors
since the dashboard already shows errors visually. This allows
running in the background without output spilling into the terminal.
@bryancall
Copy link
Contributor Author

Also cleaned up the GUI mode (--gui):

  • Keyboard navigation (h/l, arrow keys, q) now works in the GUI window
  • Toolbar hidden
  • Window starts at a reasonable size (14x8) and preserves user resizes
  • No terminal output in GUI mode — errors show on the dashboard, so it runs cleanly in the background with &

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.


You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +554 to +558
response = sock.recv(1048576).decode()
sock.close()

# Parse and output simple key=value format
data = json.loads(response)
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The embedded remote JSONRPC helper reads from the Unix socket with a single recv(). recv() is not message-framed, so this can return partial JSON even for small responses, causing json.loads() to fail and yielding empty/invalid metric output. Update the script to read until the full JSON-RPC response is received (e.g., loop until EOF/newline or until json parsing succeeds, similar to _collect_local()).

Suggested change
response = sock.recv(1048576).decode()
sock.close()
# Parse and output simple key=value format
data = json.loads(response)
# Read until we have a complete JSON response or the socket is closed
buf = b""
data = None
while True:
chunk = sock.recv(4096)
if not chunk:
break
buf += chunk
try:
data = json.loads(buf.decode())
break
except json.JSONDecodeError:
# Response not complete yet; continue reading
continue
sock.close()
if data is None:
raise SystemExit(1)
# Parse and output simple key=value format

Copilot uses AI. Check for mistakes.

# In no-keyboard mode, just save the PNG, don't try imgcat
if self.save_png:
print(f"Iteration {self.iteration}: saved to {self.save_png}")
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In --no-keyboard mode, the status line prints self.save_png, but render_page() may replace '{iter}' with the current iteration when saving. This can misreport the actual file written. Consider printing the resolved path (the same value used in render_page()) or returning it from render_page().

Suggested change
print(f"Iteration {self.iteration}: saved to {self.save_png}")
resolved_path = self.save_png
if "{iter}" in resolved_path:
try:
resolved_path = resolved_path.format(iter=self.iteration)
except Exception:
# If formatting fails for any reason, fall back to the template.
pass
print(f"Iteration {self.iteration}: saved to {resolved_path}")

Copilot uses AI. Check for mistakes.

| Option | Env Var | Default |
|--------|---------|---------|
| `--socket` | `TRAFFICSERVER_JSONRPC_SOCKET` | `/usr/local/var/trafficserver/jsonrpc20.sock` |
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README claims the default JSONRPC socket is "/usr/local/var/trafficserver/jsonrpc20.sock", but the code defaults --socket to TRAFFICSERVER_JSONRPC_SOCKET and otherwise uses auto-discovery (DEFAULT_JSONRPC_SOCKET_PATH is None when env var is unset). Please update this table to reflect the actual default/behavior so users aren't pointed at a path that may not be used.

Suggested change
| `--socket` | `TRAFFICSERVER_JSONRPC_SOCKET` | `/usr/local/var/trafficserver/jsonrpc20.sock` |
| `--socket` | `TRAFFICSERVER_JSONRPC_SOCKET` | Value of `TRAFFICSERVER_JSONRPC_SOCKET`, otherwise auto-discover (no fixed path) |

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +29
[project]
name = "traffic-grapher"
version = "1.0.0"
description = "Real-time ATS metrics visualization for iTerm2"
requires-python = ">=3.9"
license = "Apache-2.0"
dependencies = [
"matplotlib>=3.7",
"pyyaml>=6.0",
]

[project.scripts]
traffic-grapher = "traffic_grapher:main"
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pyproject.toml defines a console script entry point, but it lacks a [build-system] section and any packaging configuration to include the traffic_grapher module. As written, pip install . / uv sync is likely to fail or produce an install without the traffic_grapher module, making the traffic-grapher entry point unusable. Add a build backend (e.g., setuptools) and configure packaging (package directory or py_modules) so the entry point can import main reliably.

Copilot uses AI. Check for mistakes.
# Override history from config if specified
history = args.history
if 'history' in config:
history = config['history'].get('seconds', args.history)
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config parsing assumes config['history'] is a mapping and calls .get('seconds', ...). If a user supplies a scalar (e.g., history: 120) this will raise an AttributeError at startup. Consider accepting both forms (int seconds or {seconds: ...}) or validating the type and emitting a clear parser.error().

Suggested change
history = config['history'].get('seconds', args.history)
history_cfg = config['history']
if isinstance(history_cfg, dict):
history = history_cfg.get('seconds', args.history)
elif isinstance(history_cfg, (int, float)):
history = int(history_cfg)
else:
parser.error("Invalid 'history' in config: expected mapping or number of seconds")

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants