Skip to content

OpenAIServerConversationTracker can drop a fresh tool output after id() reuse #3620

@chutch

Description

@chutch

OpenAIServerConversationTracker can drop a fresh tool output after id() reuse

Describe the bug

OpenAIServerConversationTracker.prepare_input dedupes generated items with long-lived
sets of Python object ids:

raw_item_id = id(raw_item)
if raw_item_id in self.sent_items or raw_item_id in self.server_items:
    continue

This is unsafe once the original object can be garbage-collected. In CPython, id(obj) is the
object's address for the object's lifetime, and that address can be reused by a later allocation.
If the tracker keeps the old integer after the object is gone, a new function_call_output can be
mistaken for an already-sent item and omitted from the next request.

The concrete live source I verified is sent_items: mark_input_as_sent() records object ids for
delivered inputs, and the original object may no longer be retained after the first input has been
sent and remaining_initial_input is cleared. The same address-based check also consults
server_items, which is populated with id(output_item) in track_server_items.

When the fresh output is dropped, the server-managed continuation still has the corresponding
function call, but the request does not include its output. The provider then rejects the request
with:

Error code: 400 - No tool output found for function call <call_id>

This affects live, non-resumed runs that use server-managed continuation
(previous_response_id, conversation_id, or auto_previous_response_id). The other dedupe layers
do not cover this case:

  • server_item_ids requires a provider-assigned item id; client-built tool outputs do not have one.
  • server_tool_call_ids only covers tool outputs already acknowledged by the server or restored
    from state; a freshly produced live output is not in it.
  • The content-fingerprint guard is gated by primed_from_state, so it does not run on ordinary live
    turns.
  • drop_orphan_function_calls drops orphan calls, not outputs.

Relationship to #2798 / #2800

#2800 fixed the same root-cause class, but only for hydrated initial input during resume. It did
not change the live prepare_input identity check, mark_input_as_sent, or track_server_items.

Minimal deterministic repro

The natural allocator collision is timing-dependent, but the effect is deterministic once a stale
id matches a new output. This reproduces the drop on current main by seeding that precondition:

from typing import Any

from agents.run_internal.oai_conversation import OpenAIServerConversationTracker


class _Item:
    def __init__(self, raw_item: dict[str, Any], type: str) -> None:
        self.raw_item = raw_item
        self.type = type


tracker = OpenAIServerConversationTracker(previous_response_id="resp_1")

output = {"type": "function_call_output", "call_id": "call_FRESH", "output": "42"}
tracker.sent_items.add(id(output))  # stale id collision precondition on current main

prepared = tracker.prepare_input([], [_Item(output, "function_call_output_item")])
print(prepared)  # []

Expected behavior

A newly produced function_call_output should not be filtered out only because its object address
matches an old, no-longer-live object.

Proposed direction

Do not keep raw id() integers as long-lived dedupe state. If in-process object identity is needed,
track the object references themselves and compare with is, so the identity entry cannot outlive
the object and become a stale address key. Keep the stable provider-id, tool-call-id, and
fingerprint-based dedupe layers for the cases they already cover.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions