Skip to content

Cache read tokens are reported as 0 in run usage despite cache hits in per-response metadata #36885

@tore-unumed

Description

@tore-unumed

Bug Report: cache_read_tokens is reported as 0 despite cache hits in model usage metadata

Summary

gh aw usage artifacts can report cache_read_tokens: 0 even when model response metadata clearly indicates prompt cache hits.

In affected runs, per-response usage shows cached prompt tokens, but aggregated artifacts (token-usage.jsonl rollups and agent_usage.json) flatten cache-read to zero.

Environment

  • gh-aw CLI version: v0.77.5
  • engine: copilot
  • engine runtime: GitHub Copilot CLI 1.0.55
  • workflow model setting: gpt-5.4
  • observed primary model in usage artifact: gpt-5.4-2026-03-05
  • awf version: v0.25.58
  • wire path observed in logs: custom provider (wireApi=responses)

Reproduction

  1. Run a workflow with multiple turns and repeated shared prompt context.
  2. Download audit artifacts.
  3. Compare the following files from the same run:
    • sandbox/agent/logs/process-*.log
    • sandbox/firewall/logs/api-proxy-logs/token-usage.jsonl
    • agent_usage.json
  4. Check whether process log usage blocks show cached prompt tokens while aggregated artifacts report zero cache read.

Expected Behavior

  • If per-response usage contains cached prompt tokens, aggregate usage should reflect non-zero cache_read_tokens.
  • agent_usage.json and token-usage.jsonl aggregates should be consistent with model usage metadata from process logs.

Actual Behavior

  • Process log usage blocks repeatedly show cache activity (for example prompt_tokens_details.cached_tokens and token_type: cache_read).
  • token-usage.jsonl records cache_read_tokens: 0.
  • agent_usage.json also reports cache_read_tokens: 0.

Evidence

1) Per-response model usage shows cached prompt tokens

From sandbox/agent/logs/process-*.log:

"usage": {
  "completion_tokens": 775,
  "prompt_tokens": 32481,
  "total_tokens": 33256,
  "prompt_tokens_details": {
    "cached_tokens": 32128
  },
  "completion_tokens_details": {
    "reasoning_tokens": 141
  }
}

And in token details blocks:

{"token_count": 32128, "token_type": "cache_read"}

2) Proxy token usage records zero cache-read

From sandbox/firewall/logs/api-proxy-logs/token-usage.jsonl:

{"event":"token_usage","model":"gpt-5.4-2026-03-05","input_tokens":31521,"output_tokens":703,"cache_read_tokens":0,"cache_write_tokens":0}

3) Run-level usage aggregate also reports zero cache-read

From agent_usage.json:

{"input_tokens":7491744,"output_tokens":63513,"cache_read_tokens":0,"cache_write_tokens":0,"effective_tokens":46474776,"primary_model":"gpt-5.4-2026-03-05"}

Impact

  • Cache efficiency reporting is misleading.
  • Cost and optimization analysis can be wrong.
  • Effective token interpretation becomes harder to trust when cache accounting differs across artifact layers.

Requested Fix

  1. Ensure cache-read usage from per-response model metadata is propagated into proxy and run-level aggregates.
  2. Add a consistency check in audit output when per-response cache metadata disagrees with aggregated cache fields.
  3. Clarify whether ET enforcement and billing are computed from the same usage stream as these artifacts, or from an independent source.

Notes

  • This report focuses on usage accounting consistency; it does not assume cache itself is disabled.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions