readline_timed drops trace stream events under load (uses isc_info_svc_line per-call instead of isc_info_svc_to_eof bulk drain)

## Summary

`Server.readline_timed` (the API documented for consuming a Firebird trace stream from Python) drops the majority of trace events under sustained load. The cause is the request mode it sends to the services API: `isc_info_svc_line` (= `SrvInfoCode.LINE`, value 62), which requests one line per round-trip. Under realistic OLTP throughput the per-call overhead caps the consumer at roughly 480 trace-events per second, while Firebird's `fbtracemgr.exe` (the reference C++ consumer the engine ships) keeps up with the full stream because it uses `isc_info_svc_to_eof` (= `SrvInfoCode.TO_EOF`, value 63) and drains many lines per round-trip.

I have evidence from a five-experiment baseline (matched workload, same `python3-driver` build, only the request mode varying) showing the gap is purely the request-mode choice: a Python reader using `_svc.query(..., bytes([SrvInfoCode.TO_EOF]), ...)` against the same trace session captures **as many events as `fbtracemgr` does, exactly**.

## Reproduction

1. Start a Firebird 5.0.x server with a workload that emits trace at a non-trivial rate. I used HammerDB's TPC-C driver (200-call transaction mix run via `tclsh` from <https://github.com/fdcastel/HammerDB/tree/feature/firebird-support>) but the same effect should appear with `gbak`, a `sysbench`-style workload, or even a tight loop of `EXECUTE_STATEMENT_*` events.
2. Open a long-running trace session via `python3-driver`:
   ```python
   import firebird.driver as fb
   conn = fb.connect_server(server="localhost/3050", user="SYSDBA", password="masterkey")
   session_id = conn.trace.start(config=EVERYTHING_CONFIG, name="repro")
   while True:
       line = conn.readline_timed(1)
       if line is None: break
       sys.stdout.write(line)
   ```
3. Run the same trace.conf against the same workload using `fbtracemgr -se localhost:service_mgr -user SYSDBA -password masterkey -start -name repro2 -config trace.conf > fbtracemgr.out`.
4. Count `START_TRANSACTION` (or any frequent event type) in each capture.

Expected: identical counts. Observed: `fbtracemgr` captures roughly 6× more events on a runtxn-only workload, and ~17× more when the trace session is also observing a heavy bulk-load phase.

## Quantitative evidence

Five experiments under matched conditions (Firebird 5.0.3 on Windows Server 2022; HammerDB TPC-C 200-call transaction mix against a pre-loaded 1-warehouse schema; trace config = `EVERYTHING_CONFIG` mirrored from the project's defaults):

| Experiment | Consumer | Request mode | START_TX captured | Total events | trace bytes |
|---|---|---|---|---|---|
| E1 | `fbtracemgr` | `isc_info_svc_to_eof` | 756 (load+run) | 1 209 620 | 1.34 GB |
| E2 | `fbtracemgr` | `isc_info_svc_to_eof` | 201 (run only) | 10 170 | 8.1 MB |
| E3 | python3-driver via `Server.readline_timed` | `isc_info_svc_line` | **58** (run only) | 2 605 | 2.1 MB |
| E4 | python3-driver bare `readline_timed` loop | `isc_info_svc_line` | **58** (run only) | 2 977 | 2.4 MB |
| **E5** | **python3-driver low-level `_svc.query(..., TO_EOF, ...)`** | **`isc_info_svc_to_eof`** | **201** (run only) | **10 226** | **8.4 MB** |

E5 is the same Python process, same connection class, same Firebird session — only the SPB request item differs. It captures the full stream identically to `fbtracemgr`. Per-call verbose instrumentation on E3/E4 confirmed the Python reader is *not* throughput-bound on its own: it processes incoming lines at ~10 k lines/sec consistently. The events it's missing are dropped server-side by Firebird's trace buffer because the LINE-mode round-trips can't drain it fast enough; events accumulate until the server discards them to protect itself.

The harness produced raw traces, per-call timeline logs, and response-tag histograms. I can share specific bytes or a self-contained reproducer if helpful.

## Root cause (single line)

[`Server.readline_timed`](https://github.com/FirebirdSQL/python3-driver/blob/master/src/firebird/driver/core.py#L5727) sends `bytes([SrvInfoCode.LINE])` as the request item. Compare with Firebird's own reference trace consumer at [`src/utilities/fbtracemgr/traceMgrMain.cpp:216`](https://github.com/FirebirdSQL/firebird/blob/master/src/utilities/fbtracemgr/traceMgrMain.cpp#L216): `const char query[] = {isc_info_svc_to_eof, isc_info_end};`. Server-side handling for the two request items goes through the same `get()` function in `src/jrd/svc.cpp` but with different flags (`GET_LINE` vs `GET_EOF`), and the bulk-drain path is the only one fast enough for a busy trace stream.

A secondary issue lives in the same method: when the server returns `isc_info_data_not_ready` (transient buffer-empty, but session still active), `readline_timed` returns `None`. Callers conventionally treat `None` as EOF and tear down the iterator, which means a brief idle moment can prematurely terminate trace capture. The behavior is consistent with the current code:

```python
data = self.response.read_sized_string(encoding=self.encoding, errors=self.encoding_errors)
if self.response.get_tag() == SrvInfoCode.TIMEOUT:
    return TIMEOUT
return data if data else None
```

`SrvInfoCode.TIMEOUT` is checked explicitly but `isc_info_data_not_ready` (value 4) is not — both should signal "no data right now, session still alive" rather than EOF. `fbtracemgr`'s loop treats both indicators identically: it keeps polling until either Ctrl-C or a genuine end-of-service (which the server signals by ending the response with `isc_info_end` *without* a trailing indicator).

## Proposed fix

Adding a new public method is the least disruptive shape — existing `readline_timed` callers (notably `Server.readline`, stdin handling, and any third-party code relying on per-line semantics) keep their current behavior.

```python
def readlines_to_eof_timed(self, timeout: int) -> str | Sentinel | None:
    """Drain available trace/services text from the server in one
    round-trip. Returns the (possibly multi-line) chunk, `TIMEOUT` when
    the server-side timeout elapsed with no payload, or `None` when the
    service has reported `isc_info_end` (end of stream).

    Distinct from `readline_timed` in that:
      - the request item is `isc_info_svc_to_eof`, not `isc_info_svc_line`;
        the server returns up to MAXBUF bytes per call instead of one line.
      - the trailing indicator is inspected: `isc_info_svc_timeout` and
        `isc_info_data_not_ready` both return `TIMEOUT` (idle, keep going);
        `isc_info_truncated` is silently absorbed (more data coming);
        only `isc_info_end` (no trailing indicator) returns `None`.

    Designed for trace-session consumers that need fbtracemgr-equivalent
    throughput; falls back to `readline_timed` for one-line semantics.
    """
    self.response.clear()
    self._svc.query(self._make_request(timeout),
                    bytes([SrvInfoCode.TO_EOF]),
                    self.response.raw)
    if (tag := self.response.get_tag()) != SrvInfoCode.TO_EOF:
        raise InterfaceError(f"Service responded with error code: {tag}")
    data = self.response.read_sized_string(encoding=self.encoding,
                                           errors=self.encoding_errors)
    trailing = self.response.get_tag()
    if trailing == ISC_INFO_END:  # value 1
        return data if data else None
    if data:
        return data
    return TIMEOUT
```

`Server.readline` could optionally call this internally and split the chunk into per-line yields, preserving its existing signature while picking up the throughput improvement automatically; the per-line buffering pattern would resemble the line-buffer in the existing `Server.readline` flow.

Tests should cover at least:
- a busy mock services connection where `_svc.query` returns multi-line responses with the various trailing indicators
- the `isc_info_end` termination case (final round with no payload, no trailing indicator)
- the `isc_info_data_not_ready` idle case (does NOT return `None`)

I'm happy to send a PR with the implementation and tests if the maintainers prefer that route.

## Context

I hit this while diagnosing low event-capture rates in a long-running Python trace consumer on a busy multi-host deployment. The five-experiment baseline above pinned the root cause to the request-mode choice. As a workaround I call `_svc.query` directly with the `TO_EOF` request item (private-API access in user code) and the throughput gap closes — the same Python process now captures the full trace stream identically to `fbtracemgr`. I would much rather depend on a public driver method than maintain the workaround.

Happy to provide more diagnostics, sample captures, or a draft PR — whichever shape the maintainers find easiest to move forward.

Thanks for the work on this library.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

readline_timed drops trace stream events under load (uses isc_info_svc_line per-call instead of isc_info_svc_to_eof bulk drain) #70

Summary

Reproduction

Quantitative evidence

Root cause (single line)

Proposed fix

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Experiment	Consumer	Request mode	START_TX captured	Total events	trace bytes
E1	`fbtracemgr`	`isc_info_svc_to_eof`	756 (load+run)	1 209 620	1.34 GB
E2	`fbtracemgr`	`isc_info_svc_to_eof`	201 (run only)	10 170	8.1 MB
E3	python3-driver via `Server.readline_timed`	`isc_info_svc_line`	58 (run only)	2 605	2.1 MB
E4	python3-driver bare `readline_timed` loop	`isc_info_svc_line`	58 (run only)	2 977	2.4 MB
E5	python3-driver low-level `_svc.query(..., TO_EOF, ...)`	`isc_info_svc_to_eof`	201 (run only)	10 226	8.4 MB

Uh oh!

readline_timed drops trace stream events under load (uses isc_info_svc_line per-call instead of isc_info_svc_to_eof bulk drain) #70

Description

Summary

Reproduction

Quantitative evidence

Root cause (single line)

Proposed fix

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions