Skip to content

Add timeout and in-flight observability to OperationTimedOut#818

Draft
mykaul wants to merge 2 commits intoscylladb:masterfrom
mykaul:better_timeout_print
Draft

Add timeout and in-flight observability to OperationTimedOut#818
mykaul wants to merge 2 commits intoscylladb:masterfrom
mykaul:better_timeout_print

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Apr 14, 2026

Summary

Improve timeout observability in the Python driver, inspired by the Go driver PR scylladb/gocql#847.

  • Commit 1: Fix unfilled %s format string in add_execution_profile timeout message (cherry-pickable)
  • Commit 2: Add timeout and in_flight fields to OperationTimedOut, debug logging for client-side and server-side timeouts

Changes

OperationTimedOut enhancement (cassandra/__init__.py)

  • New optional timeout and in_flight keyword parameters (backward compatible)
  • Exception message appends (timeout=Xs, in_flight=N) when timeout is set
  • When timeout is None or 0, no timeout info is printed

Client-side timeout observability (cassandra/connection.py, cassandra/cluster.py)

  • All 7 production raise sites now pass timeout= and in_flight= where available
  • Debug log emitted on every client-side timeout with host, timeout, in_flight, and orphaned counts

Server-side timeout logging (cassandra/cluster.py)

  • Debug log for server read timeouts: host, consistency, received/required, data_retrieved, retry decision
  • Debug log for server write timeouts: host, consistency, received/required, write_type, retry decision
  • _retry_decision_name() helper translates RetryPolicy constants to human-readable names

Tests

  • 7 new unit tests for OperationTimedOut message formatting and attribute access
  • Updated existing heartbeat and response future tests to assert timeout and in_flight values

mykaul added 2 commits April 14, 2026 20:03
The error message at Cluster.add_execution_profile() had an unfilled %s
placeholder: 'Failed to create all new connection pools in the %ss timeout.'
The pool_wait_timeout value was never interpolated into the string, so
users would see a literal '%s' instead of the actual timeout value.

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
Improve timeout observability in the driver, inspired by the Go driver
PR scylladb/gocql#847.

OperationTimedOut now carries optional timeout and in_flight fields that
are appended to the exception message when present (e.g.
"(timeout=10.0s, in_flight=42)"). All seven production raise sites in
connection.py and cluster.py pass these values where available.

Additionally, debug-level log lines are emitted for:
- Client-side request timeouts (host, timeout, in_flight, orphaned)
- Server-side read/write timeouts (host, consistency, received/required,
  data_retrieved/write_type, retry decision)

A helper _retry_decision_name() translates RetryPolicy constants to
human-readable strings for the log messages.

New keyword-only parameters are backward compatible — existing callers
that pass only positional errors/last_host continue to work unchanged.

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant