OpenAIServerConversationTracker can drop a fresh tool output after id() reuse
Describe the bug
OpenAIServerConversationTracker.prepare_input dedupes generated items with long-lived
sets of Python object ids:
raw_item_id = id(raw_item)
if raw_item_id in self.sent_items or raw_item_id in self.server_items:
continue
This is unsafe once the original object can be garbage-collected. In CPython, id(obj) is the
object's address for the object's lifetime, and that address can be reused by a later allocation.
If the tracker keeps the old integer after the object is gone, a new function_call_output can be
mistaken for an already-sent item and omitted from the next request.
The concrete live source I verified is sent_items: mark_input_as_sent() records object ids for
delivered inputs, and the original object may no longer be retained after the first input has been
sent and remaining_initial_input is cleared. The same address-based check also consults
server_items, which is populated with id(output_item) in track_server_items.
When the fresh output is dropped, the server-managed continuation still has the corresponding
function call, but the request does not include its output. The provider then rejects the request
with:
Error code: 400 - No tool output found for function call <call_id>
This affects live, non-resumed runs that use server-managed continuation
(previous_response_id, conversation_id, or auto_previous_response_id). The other dedupe layers
do not cover this case:
server_item_ids requires a provider-assigned item id; client-built tool outputs do not have one.
server_tool_call_ids only covers tool outputs already acknowledged by the server or restored
from state; a freshly produced live output is not in it.
- The content-fingerprint guard is gated by
primed_from_state, so it does not run on ordinary live
turns.
drop_orphan_function_calls drops orphan calls, not outputs.
#2800 fixed the same root-cause class, but only for hydrated initial input during resume. It did
not change the live prepare_input identity check, mark_input_as_sent, or track_server_items.
Minimal deterministic repro
The natural allocator collision is timing-dependent, but the effect is deterministic once a stale
id matches a new output. This reproduces the drop on current main by seeding that precondition:
from typing import Any
from agents.run_internal.oai_conversation import OpenAIServerConversationTracker
class _Item:
def __init__(self, raw_item: dict[str, Any], type: str) -> None:
self.raw_item = raw_item
self.type = type
tracker = OpenAIServerConversationTracker(previous_response_id="resp_1")
output = {"type": "function_call_output", "call_id": "call_FRESH", "output": "42"}
tracker.sent_items.add(id(output)) # stale id collision precondition on current main
prepared = tracker.prepare_input([], [_Item(output, "function_call_output_item")])
print(prepared) # []
Expected behavior
A newly produced function_call_output should not be filtered out only because its object address
matches an old, no-longer-live object.
Proposed direction
Do not keep raw id() integers as long-lived dedupe state. If in-process object identity is needed,
track the object references themselves and compare with is, so the identity entry cannot outlive
the object and become a stale address key. Keep the stable provider-id, tool-call-id, and
fingerprint-based dedupe layers for the cases they already cover.
OpenAIServerConversationTrackercan drop a fresh tool output afterid()reuseDescribe the bug
OpenAIServerConversationTracker.prepare_inputdedupes generated items with long-livedsets of Python object ids:
This is unsafe once the original object can be garbage-collected. In CPython,
id(obj)is theobject's address for the object's lifetime, and that address can be reused by a later allocation.
If the tracker keeps the old integer after the object is gone, a new
function_call_outputcan bemistaken for an already-sent item and omitted from the next request.
The concrete live source I verified is
sent_items:mark_input_as_sent()records object ids fordelivered inputs, and the original object may no longer be retained after the first input has been
sent and
remaining_initial_inputis cleared. The same address-based check also consultsserver_items, which is populated withid(output_item)intrack_server_items.When the fresh output is dropped, the server-managed continuation still has the corresponding
function call, but the request does not include its output. The provider then rejects the request
with:
This affects live, non-resumed runs that use server-managed continuation
(
previous_response_id,conversation_id, orauto_previous_response_id). The other dedupe layersdo not cover this case:
server_item_idsrequires a provider-assigned item id; client-built tool outputs do not have one.server_tool_call_idsonly covers tool outputs already acknowledged by the server or restoredfrom state; a freshly produced live output is not in it.
primed_from_state, so it does not run on ordinary liveturns.
drop_orphan_function_callsdrops orphan calls, not outputs.Relationship to #2798 / #2800
#2800 fixed the same root-cause class, but only for hydrated initial input during resume. It did
not change the live
prepare_inputidentity check,mark_input_as_sent, ortrack_server_items.Minimal deterministic repro
The natural allocator collision is timing-dependent, but the effect is deterministic once a stale
id matches a new output. This reproduces the drop on current
mainby seeding that precondition:Expected behavior
A newly produced
function_call_outputshould not be filtered out only because its object addressmatches an old, no-longer-live object.
Proposed direction
Do not keep raw
id()integers as long-lived dedupe state. If in-process object identity is needed,track the object references themselves and compare with
is, so the identity entry cannot outlivethe object and become a stale address key. Keep the stable provider-id, tool-call-id, and
fingerprint-based dedupe layers for the cases they already cover.