Bug Report: cache_read_tokens is reported as 0 despite cache hits in model usage metadata
Summary
gh aw usage artifacts can report cache_read_tokens: 0 even when model response metadata clearly indicates prompt cache hits.
In affected runs, per-response usage shows cached prompt tokens, but aggregated artifacts (token-usage.jsonl rollups and agent_usage.json) flatten cache-read to zero.
Environment
- gh-aw CLI version:
v0.77.5
- engine:
copilot
- engine runtime:
GitHub Copilot CLI 1.0.55
- workflow model setting:
gpt-5.4
- observed primary model in usage artifact:
gpt-5.4-2026-03-05
- awf version:
v0.25.58
- wire path observed in logs: custom provider (
wireApi=responses)
Reproduction
- Run a workflow with multiple turns and repeated shared prompt context.
- Download audit artifacts.
- Compare the following files from the same run:
sandbox/agent/logs/process-*.log
sandbox/firewall/logs/api-proxy-logs/token-usage.jsonl
agent_usage.json
- Check whether process log usage blocks show cached prompt tokens while aggregated artifacts report zero cache read.
Expected Behavior
- If per-response usage contains cached prompt tokens, aggregate usage should reflect non-zero
cache_read_tokens.
agent_usage.json and token-usage.jsonl aggregates should be consistent with model usage metadata from process logs.
Actual Behavior
- Process log usage blocks repeatedly show cache activity (for example
prompt_tokens_details.cached_tokens and token_type: cache_read).
token-usage.jsonl records cache_read_tokens: 0.
agent_usage.json also reports cache_read_tokens: 0.
Evidence
1) Per-response model usage shows cached prompt tokens
From sandbox/agent/logs/process-*.log:
"usage": {
"completion_tokens": 775,
"prompt_tokens": 32481,
"total_tokens": 33256,
"prompt_tokens_details": {
"cached_tokens": 32128
},
"completion_tokens_details": {
"reasoning_tokens": 141
}
}
And in token details blocks:
{"token_count": 32128, "token_type": "cache_read"}
2) Proxy token usage records zero cache-read
From sandbox/firewall/logs/api-proxy-logs/token-usage.jsonl:
{"event":"token_usage","model":"gpt-5.4-2026-03-05","input_tokens":31521,"output_tokens":703,"cache_read_tokens":0,"cache_write_tokens":0}
3) Run-level usage aggregate also reports zero cache-read
From agent_usage.json:
{"input_tokens":7491744,"output_tokens":63513,"cache_read_tokens":0,"cache_write_tokens":0,"effective_tokens":46474776,"primary_model":"gpt-5.4-2026-03-05"}
Impact
- Cache efficiency reporting is misleading.
- Cost and optimization analysis can be wrong.
- Effective token interpretation becomes harder to trust when cache accounting differs across artifact layers.
Requested Fix
- Ensure cache-read usage from per-response model metadata is propagated into proxy and run-level aggregates.
- Add a consistency check in audit output when per-response cache metadata disagrees with aggregated cache fields.
- Clarify whether ET enforcement and billing are computed from the same usage stream as these artifacts, or from an independent source.
Notes
- This report focuses on usage accounting consistency; it does not assume cache itself is disabled.
Bug Report: cache_read_tokens is reported as 0 despite cache hits in model usage metadata
Summary
gh awusage artifacts can reportcache_read_tokens: 0even when model response metadata clearly indicates prompt cache hits.In affected runs, per-response usage shows cached prompt tokens, but aggregated artifacts (
token-usage.jsonlrollups andagent_usage.json) flatten cache-read to zero.Environment
v0.77.5copilotGitHub Copilot CLI1.0.55gpt-5.4gpt-5.4-2026-03-05v0.25.58wireApi=responses)Reproduction
sandbox/agent/logs/process-*.logsandbox/firewall/logs/api-proxy-logs/token-usage.jsonlagent_usage.jsonExpected Behavior
cache_read_tokens.agent_usage.jsonandtoken-usage.jsonlaggregates should be consistent with model usage metadata from process logs.Actual Behavior
prompt_tokens_details.cached_tokensandtoken_type: cache_read).token-usage.jsonlrecordscache_read_tokens: 0.agent_usage.jsonalso reportscache_read_tokens: 0.Evidence
1) Per-response model usage shows cached prompt tokens
From
sandbox/agent/logs/process-*.log:And in token details blocks:
{"token_count": 32128, "token_type": "cache_read"}2) Proxy token usage records zero cache-read
From
sandbox/firewall/logs/api-proxy-logs/token-usage.jsonl:{"event":"token_usage","model":"gpt-5.4-2026-03-05","input_tokens":31521,"output_tokens":703,"cache_read_tokens":0,"cache_write_tokens":0}3) Run-level usage aggregate also reports zero cache-read
From
agent_usage.json:{"input_tokens":7491744,"output_tokens":63513,"cache_read_tokens":0,"cache_write_tokens":0,"effective_tokens":46474776,"primary_model":"gpt-5.4-2026-03-05"}Impact
Requested Fix
Notes