Skip to content

Prompt caching drops to 0% when introducing base64 file inputs #2784

@shahin43

Description

@shahin43

Hi team,

we are testing prompt caching behaviour using Agents SDK and noticed an unexpected drop in cache (0%) when introducing file inputs (as base64 encoded)


Setup

  • Model: gpt-5.4-mini
  • SDK: Python (custom script)
  • File input: PDF via base64 (input_file)
  • Conversation store enabled
  • Prompt cache key added
  • Response chaining enabled using previous_response_id
  • Prompt:
    • Using Chat Prompts; passing prompt_id as instructions
    • Large static prefix (~13k tokens)
    • Same system instructions
    • Same tool definitions
    • Only user message varies

Results

Round 1: text_only_first     cache_hit=96.7%
Round 2: text_only_repeat    cache_hit=96.7%

Round 3: file_input_first    cache_hit=0.0%   <-- unexpected (even though considerable cache drop expected due to increased input tokens)
Round 4: text_only_new       cache_hit=96.7%

Round 5: file_input_repeat   cache_hit=90.3%
Round 6: text_only_new       cache_hit=96.7%

We tried this few times and above behaviour is consistent across Agents SDK and using Responses API directly


Key observation

  • Adding a file input causes cache to drop to 0%, even though:

    • Prompt prefix is unchanged
    • Tool definitions are unchanged
  • Repeating the same file input restores cache (~90%)

  • Tried different options, using a prompt_cache_key, for better caching rates but observing same issue and it has got zero impact on this


Expected behaviour

  • Cache % to drop due to additional tokens or input from files
  • But not drop to 0%, since the shared prefix, instructions, tools are still same

Questions

  1. Any known issues with input files and cache hits, file inputs treated as part of the cacheable prefix in a way that invalidates prior cache ?
  2. Does base64 encoding prevent prefix matching across requests ?
  3. Would using file_id instead of base64 improve cache reuse?
  4. Is this expected behaviour or a potential issue ?

Hypothesis

It seems that introducing file inputs:

  • Changes the effective prompt prefix completely
  • Causes cache routing to treat the request as a new prefix

Additional notes

  • All prompts exceed the 1024 token threshold
  • Cache works consistently for text-only flows
  • Issue appears only when mixing text-only and file-input requests

Would appreciate any clarification or best practices for maintaining cache efficiency in mixed input scenarios.

Happy to provide a minimal repro script if needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-more-infoWaiting for a reply/more info from the authorquestionQuestion about using the SDK

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions