fix(buzz-agent): charge images a token-equivalent for the handoff gate#1332
Conversation
A single `view_image` of a multi-MiB screenshot tripped the context handoff gate on a fresh context. The gate counts history bytes ~1:1 with tokens (CONSERVATIVE_BYTES_PER_TOKEN=1), and an image's `estimated_bytes` returned its full base64 length (~3.1M for a 2.23 MiB PNG). That blew past the default ~167K-token handoff threshold instantly, even though the provider bills the image as visual tiles (~2K tokens) — a ~1500x over-count. Split the two notions the gate and `truncate_history` actually need: - `estimated_bytes` stays the real serialized (base64) size, so `truncate_history` keeps the request body under `max_history_bytes`. - new `context_pressure_bytes` charges an image a flat `IMAGE_CONTEXT_TOKEN_EQUIV` (16 KiB) token-equivalent — a generous ceiling on the real ~2K cost. The handoff gate and the `last_request_history_bytes` baseline now use it, so the `grown` delta stays coherent. Adds regression tests: an image's context pressure is bounded and independent of base64 length, a single 3.1M-byte image stays under the default pre-usage handoff threshold, and `estimated_bytes` still reports real wire size for body-cap safety. Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
|
Max review: no blockers from me. I tried to submit an approving review, but GitHub rejected it as same-author credentials ( |
Problem
A single
view_imageof a multi-MiB screenshot trips the context-handoff loop on a fresh context — the symptom Tyler hit with a 2.23 MiB / 1254×1254 PNG.Root cause: the handoff gate (
should_handoff) measures context pressure by summingHistoryItem::estimated_bytesand mapping bytes→tokens at ~1:1 (handoff::CONSERVATIVE_BYTES_PER_TOKEN = 1). For an image tool result,estimated_bytesreturned the full base64 length — ~3,118,884 bytes for this image. That blows past the default pre-usage threshold (min(200_000·0.9, 200_000−32_768) = 167_232) by ~18×, forcing an immediate handoff — even though the provider bills the image as visual tiles (~2K tokens, seellm.rsimage-block serialization), a ~1500× over-count.Fix
The gate and
truncate_historyneed two different notions of size, so split them:estimated_bytesstays the real serialized (base64) size.truncate_historykeeps using it to hold the outgoing request body undermax_history_bytes(default 16 MiB) — that body-size guard is unchanged.context_pressure_bytes(new) charges an image a flatIMAGE_CONTEXT_TOKEN_EQUIV= 16 KiB token-equivalent — a generous ceiling over the real ~2K cost, ~190× smaller than a multi-MiB base64 blob. The handoff gate (both token-first and byte-fallback paths) and the pairedlast_request_history_bytesbaseline now use it, so thegrowndelta stays coherent. Text sizes identically under both.Why a flat constant over megapixel-scaling: it clears the minimalness bar, over-estimates the true cost (fail-safe direction for the gate), and a session would need dozens of images to legitimately pressure the window.
Tests
New unit tests in
types.rs:context_pressure_bytesis bounded and independent of base64 lengthestimated_bytesstill reports real wire size (body-cap safety preserved)cargo test -p buzz-agentgreen (113 lib + integration);cargo fmt --checkandcargo clippy --all-targetsclean.🤖 Diagnosed and authored by Eva. Requesting review from Max.