diff --git a/docs/rfds/server-logging.mdx b/docs/rfds/server-logging.mdx new file mode 100644 index 0000000..72b979f --- /dev/null +++ b/docs/rfds/server-logging.mdx @@ -0,0 +1,178 @@ +--- +title: "Agent-to-Client Logging" +--- + +Author(s): [@chazcb](https://github.com/chazcb) + +## Elevator pitch + +> What are you proposing to change? + +Introduce a capability-gated `log` notification (agent → client) so agents can share diagnostic messages without polluting conversation history. + +## Status quo + +> How do things work today and what problems does this cause? Why would we change things? + +Today, agents have limited ways to inform clients about status that might impact their experience. The two options are: + +1. **JSON-RPC errors**: Terminate the request immediately with an informative error message the client can display to the user +2. **`session/update`**: Update conversation history with diagnostic information in the `agent_message_chunk` or other chat history notification + +But neither option works when: + +- There's no active JSON RPC request to attach an error response to +- We don't want to fail the request (e.g., retries, rate limiting, fallback selection) +- There's no session yet (diagnostics after `initialize` but before `session/new`) +- We don't want to put diagnostics in chat history, or to force clients to filter non-chat content, or to fake chat content just to send diagnostic logs, etc. + +Without a way to surface these situations, users can be left confused when their ACP connection or session seem to stall or behave unexpectedly. + +## What we propose to do about it + +> What are you proposing to improve the situation? + +Add a `log` JSON-RPC notification that is explicitly capability-gated. Clients opt in via `clientCapabilities.logging`; agents only send logs to clients that declare the capability. Clients can optionally specify a minimum log level. + +```json +{ + "method": "initialize", + "params": { + "clientCapabilities": { + "logging": { + "level": "warning" + } + } + } +} +``` + +If `level` is omitted, agents should default to `info`. Agents MUST NOT send logs below the client's requested level. + +### Method + +```json +{ + "jsonrpc": "2.0", + "method": "log", + "params": { + "level": "warning", + "message": "Backing model rate limited, retrying in 5 seconds...", + "sessionId": "abc-123", + "logger": "model", + "timestamp": "2025-01-21T10:30:00Z", + "data": { "model": "claude-3", "retryIn": 5 } + } +} +``` + +### Fields + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `level` | `LogLevel` | Yes | RFC 5424 severity: `debug`, `info`, `notice`, `warning`, `error`, `critical`, `alert`, `emergency` | +| `message` | `string` | Yes | Human-readable summary safe for display | +| `sessionId` | `SessionId` | No | Omit for connection-wide messages | +| `logger` | `string` | No | Component name (e.g., "model", "auth") | +| `timestamp` | `string` | No | ISO 8601 timestamp if provided | +| `data` | `object` | No | Opaque context (clients must not depend on structure) | +| `_meta` | `object` | No | Extensibility metadata | + +### Semantics + +- **Capability-gated**: Agents MUST NOT send `log` notification to clients that did not declare `clientCapabilities.logging`. +- **Level filtering**: Agents MUST NOT send logs below the client's requested level (default: `info`). +- **Informational only**: Clients MAY display logs but MUST NOT treat them as protocol-affecting or control flow signals. +- **Best-effort delivery**: Logs are not reliable transport and are not replayed on reconnect. +- **Session optional**: `sessionId` is optional; omitted logs are connection-wide. +- **Manageable volume**: Implementations should keep volume low and user-relevant. + +### Method naming + +`log` follows ACP's convention for connection-level operations (e.g., `initialize`, `authenticate`) rather than introducing a new namespace. + +## Alternatives considered + +### Add a notification type to `session/update` + +Extend `session/update` with a new notification type for diagnostics. This keeps diagnostics within the existing session machinery but has drawbacks: it requires a session (can't send connection-wide diagnostics), risks polluting chat history unless clients explicitly filter, and overloads `session/update` with non-conversation concerns. Additionally, ACP specifies that session history is replayed on `session/load`, but diagnostic logs are transient and shouldn't be replayed—they're not part of the conversation. + +### Structured `status` notification + +Instead of general-purpose logging, define a more structured `status` notification explicitly for lightweight status info—similar to Claude Code's interim status messages ("Thinking...", "Searching files..."). This would be scoped to either the current session or the agent/connection level. + +**Tradeoffs**: More constrained semantics could be clearer for clients, but less flexible. Logging with severity levels is a well-understood pattern; inventing a new "status" abstraction may not add value over `log` with `level: info`. + +### Explicit progress or heartbeat notification + +Define a `progress` or `heartbeat` notification specifically for long-running operations, with structured fields like `percentComplete`, `estimatedTimeRemaining`, etc. + +**Tradeoffs**: Progress is better suited to `session/update` since it's about task state. Heartbeats could be useful but solve a different problem (connection liveness) than diagnostics. A `log` notification can express "retrying in 5s" without requiring structured progress semantics. + +### Transport-level mechanisms + +Use HTTP headers, WebSocket ping payloads, or other transport-level channels for status. + +**Tradeoffs**: ACP is transport-agnostic. Relying on transport-specific mechanisms would fragment implementations and lose capability negotiation. + +## Shiny future + +> How will things play out once this feature exists? + +- **Clear connection feedback**: Clients can surface warnings (rate limits, retries, fallbacks) so users understand what's happening. +- **No more mysterious stalls**: Users see why things are slow (retries, rate limits) rather than assuming a hang. +- **Better developer experience**: Diagnostics are visible without requiring OTEL or external logging. +- **No compatibility risk**: Capability gating means legacy clients are unaffected. + +## Implementation details and plan + +> Tell me more about your implementation. What is your detailed implementation plan? + +1. **Schema**: Add a `LogLevel` enum and a `LogNotification` params schema with the fields above. +2. **Capabilities**: Add `clientCapabilities.logging` with optional `level` field for minimum severity filtering. +3. **Protocol**: Add `log` to method tables and route it through notification handling. +4. **Docs**: Update protocol docs and examples to show capability negotiation and sample logs. + +## Frequently asked questions + +> What questions have arisen over the course of authoring this document or during subsequent discussions? + +### Why not use `session/update`? + +`session/update` represents conversation state. Logs are diagnostic metadata and should not appear in chat history or require clients to filter out non-conversation content. `session/update` also can't represent connection-wide issues because it requires `sessionId`. + +### Why not send diagnostics as agent text messages? + +Agent messages are persistent conversation content. Logs are ephemeral status and should not be reloaded or forked with the session. Agent text also lacks severity levels and would require ad-hoc parsing to separate real answers from diagnostics. + +### Why not use a separate channel (stderr, SSE side channel, etc.)? + +ACP is transport-agnostic. A protocol-level log works uniformly across stdio, WebSocket, and HTTP, reuses capability negotiation, and allows optional session scoping without inventing a parallel channel. + +### Why not return errors on `prompt()`? + +JSON-RPC errors terminate the request. Many conditions (rate limiting, retries, fallback selection) are non-fatal and should not end the run. Logs allow notification without aborting. + +### Are logs ordered relative to other notifications? + +No strict guarantees. Implementations may keep logs ordered with other notifications for readability, but clients must treat them as best-effort informational messages. + +### Are logs replayed after reconnect? + +No. Logs are not part of session state and are not replayed. + +### How does this relate to Agent Telemetry Export? + +They are complementary: `log` is low-volume, user-facing diagnostics in-band; OTEL, as currently proposed, is for high-volume, developer/ops telemetry out-of-band. See `/docs/rfds/agent-telemetry-export`. + +### Is this a breaking change? + +No. It is opt-in via capability negotiation; older clients won't receive notifications they don't understand. + +### Why RFC 5424 log levels instead of error/warning/info? + +RFC 5424 is widely used and aligns with MCP and common logging libraries. Clients can map to simpler categories in their UI. + +## Revision history + +- **2025-01-21**: Initial draft