Cache read tokens are reported as 0 in run usage despite cache hits in per-response metadata

# Bug Report: cache_read_tokens is reported as 0 despite cache hits in model usage metadata

## Summary
`gh aw` usage artifacts can report `cache_read_tokens: 0` even when model response metadata clearly indicates prompt cache hits.

In affected runs, per-response usage shows cached prompt tokens, but aggregated artifacts (`token-usage.jsonl` rollups and `agent_usage.json`) flatten cache-read to zero.

## Environment
- gh-aw CLI version: `v0.77.5`
- engine: `copilot`
- engine runtime: `GitHub Copilot CLI` `1.0.55`
- workflow model setting: `gpt-5.4`
- observed primary model in usage artifact: `gpt-5.4-2026-03-05`
- awf version: `v0.25.58`
- wire path observed in logs: custom provider (`wireApi=responses`)

## Reproduction
1. Run a workflow with multiple turns and repeated shared prompt context.
2. Download audit artifacts.
3. Compare the following files from the same run:
   - `sandbox/agent/logs/process-*.log`
   - `sandbox/firewall/logs/api-proxy-logs/token-usage.jsonl`
   - `agent_usage.json`
4. Check whether process log usage blocks show cached prompt tokens while aggregated artifacts report zero cache read.

## Expected Behavior
- If per-response usage contains cached prompt tokens, aggregate usage should reflect non-zero `cache_read_tokens`.
- `agent_usage.json` and `token-usage.jsonl` aggregates should be consistent with model usage metadata from process logs.

## Actual Behavior
- Process log usage blocks repeatedly show cache activity (for example `prompt_tokens_details.cached_tokens` and `token_type: cache_read`).
- `token-usage.jsonl` records `cache_read_tokens: 0`.
- `agent_usage.json` also reports `cache_read_tokens: 0`.

## Evidence

### 1) Per-response model usage shows cached prompt tokens
From `sandbox/agent/logs/process-*.log`:

```json
"usage": {
  "completion_tokens": 775,
  "prompt_tokens": 32481,
  "total_tokens": 33256,
  "prompt_tokens_details": {
    "cached_tokens": 32128
  },
  "completion_tokens_details": {
    "reasoning_tokens": 141
  }
}
```

And in token details blocks:

```json
{"token_count": 32128, "token_type": "cache_read"}
```

### 2) Proxy token usage records zero cache-read
From `sandbox/firewall/logs/api-proxy-logs/token-usage.jsonl`:

```json
{"event":"token_usage","model":"gpt-5.4-2026-03-05","input_tokens":31521,"output_tokens":703,"cache_read_tokens":0,"cache_write_tokens":0}
```

### 3) Run-level usage aggregate also reports zero cache-read
From `agent_usage.json`:

```json
{"input_tokens":7491744,"output_tokens":63513,"cache_read_tokens":0,"cache_write_tokens":0,"effective_tokens":46474776,"primary_model":"gpt-5.4-2026-03-05"}
```

## Impact
- Cache efficiency reporting is misleading.
- Cost and optimization analysis can be wrong.
- Effective token interpretation becomes harder to trust when cache accounting differs across artifact layers.

## Requested Fix
1. Ensure cache-read usage from per-response model metadata is propagated into proxy and run-level aggregates.
2. Add a consistency check in audit output when per-response cache metadata disagrees with aggregated cache fields.
3. Clarify whether ET enforcement and billing are computed from the same usage stream as these artifacts, or from an independent source.

## Notes
- This report focuses on usage accounting consistency; it does not assume cache itself is disabled.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache read tokens are reported as 0 in run usage despite cache hits in per-response metadata #36885

Bug Report: cache_read_tokens is reported as 0 despite cache hits in model usage metadata

Summary

Environment

Reproduction

Expected Behavior

Actual Behavior

Evidence

1) Per-response model usage shows cached prompt tokens

2) Proxy token usage records zero cache-read

3) Run-level usage aggregate also reports zero cache-read

Impact

Requested Fix

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Cache read tokens are reported as 0 in run usage despite cache hits in per-response metadata #36885

Description

Bug Report: cache_read_tokens is reported as 0 despite cache hits in model usage metadata

Summary

Environment

Reproduction

Expected Behavior

Actual Behavior

Evidence

1) Per-response model usage shows cached prompt tokens

2) Proxy token usage records zero cache-read

3) Run-level usage aggregate also reports zero cache-read

Impact

Requested Fix

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions