-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Open
Labels
needs-more-infoWaiting for a reply/more info from the authorWaiting for a reply/more info from the authorquestionQuestion about using the SDKQuestion about using the SDK
Description
Hi team,
we are testing prompt caching behaviour using Agents SDK and noticed an unexpected drop in cache (0%) when introducing file inputs (as base64 encoded)
Setup
- Model:
gpt-5.4-mini - SDK: Python (custom script)
- File input: PDF via base64 (
input_file) - Conversation store enabled
- Prompt cache key added
- Response chaining enabled using
previous_response_id - Prompt:
- Using Chat Prompts; passing prompt_id as instructions
- Large static prefix (~13k tokens)
- Same system instructions
- Same tool definitions
- Only user message varies
Results
Round 1: text_only_first cache_hit=96.7%
Round 2: text_only_repeat cache_hit=96.7%
Round 3: file_input_first cache_hit=0.0% <-- unexpected (even though considerable cache drop expected due to increased input tokens)
Round 4: text_only_new cache_hit=96.7%
Round 5: file_input_repeat cache_hit=90.3%
Round 6: text_only_new cache_hit=96.7%
We tried this few times and above behaviour is consistent across Agents SDK and using Responses API directly
Key observation
-
Adding a file input causes cache to drop to 0%, even though:
- Prompt prefix is unchanged
- Tool definitions are unchanged
-
Repeating the same file input restores cache (~90%)
-
Tried different options, using a
prompt_cache_key, for better caching rates but observing same issue and it has got zero impact on this
Expected behaviour
- Cache % to drop due to additional tokens or input from files
- But not drop to 0%, since the shared prefix, instructions, tools are still same
Questions
- Any known issues with input files and cache hits, file inputs treated as part of the cacheable prefix in a way that invalidates prior cache ?
- Does base64 encoding prevent prefix matching across requests ?
- Would using
file_idinstead of base64 improve cache reuse? - Is this expected behaviour or a potential issue ?
Hypothesis
It seems that introducing file inputs:
- Changes the effective prompt prefix completely
- Causes cache routing to treat the request as a new prefix
Additional notes
- All prompts exceed the 1024 token threshold
- Cache works consistently for text-only flows
- Issue appears only when mixing text-only and file-input requests
Would appreciate any clarification or best practices for maintaining cache efficiency in mixed input scenarios.
Happy to provide a minimal repro script if needed.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
needs-more-infoWaiting for a reply/more info from the authorWaiting for a reply/more info from the authorquestionQuestion about using the SDKQuestion about using the SDK