Skip to content

Keep response spans active during LLM hooks#2779

Draft
smf-h wants to merge 4 commits intoopenai:mainfrom
smf-h:codex/span-metadata-1844
Draft

Keep response spans active during LLM hooks#2779
smf-h wants to merge 4 commits intoopenai:mainfrom
smf-h:codex/span-metadata-1844

Conversation

@smf-h
Copy link

@smf-h smf-h commented Mar 26, 2026

Summary

  • keep the active
    esponse span current while RunHooks.on_llm_start/on_llm_end and AgentHooks.on_llm_start/on_llm_end execute.
  • export hook-attached
    esponse span metadata and cover both streamed and non-streamed paths with regression tests.
  • preserve compatibility with custom Model implementations that still use the older method signatures, and finish streamed
    esponse spans before tool/handoff processing so streamed traces match the non-streamed hierarchy.

Test plan

  • uv run pytest tests/test_llm_hook_tracing.py -q
  • uv run pytest tests/test_agent_tracing.py tests/test_tracing_errors.py tests/test_tracing_errors_streamed.py -q
  • uv run pytest tests/test_agent_prompt.py tests/test_agent_runner_streamed.py tests/test_cancel_streaming.py tests/test_hitl_session_scenario.py tests/test_streaming_tool_call_arguments.py tests/test_run_state.py -q
  • uv run ruff check
  • uv run pyright --project pyrightconfig.json
  • uv run mypy . --exclude site still reports the pre-existing examples/tools/image_generator.py:24 unused ype: ignore, which I also reproduced in a clean baseline worktree.

Issue number

Closes #1844

Checks

  • I've added new tests (if relevant)
  • I've added/updated the relevant documentation
  • I've run make lint and make format
  • I've made sure tests pass

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7927f0f533

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

)
stream_failed_retry_attempts: list[int] = [0]
retry_stream = stream_response_with_retry(
get_stream=lambda: model.stream_response(**stream_request_kwargs),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve positional invocation for streamed model calls

run_single_turn_streamed now calls model.stream_response via model.stream_response(**stream_request_kwargs). In the previous implementation, the required model arguments were passed positionally, so legacy custom Model implementations with positional-only parameters (or different parameter names but the same positional order) still worked; with this change they now raise TypeError at runtime and streamed runs fail before producing output. Please keep the required parameters positional (or add a fallback path) to maintain compatibility with existing custom streamed models.

Useful? React with 👍 / 👎.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored positional-argument compatibility for both streamed and non-streamed custom models. Required parameters are positional again, optional values continue through kwargs, and response_span is still passed only when the model supports it.

@seratch
Copy link
Member

seratch commented Mar 26, 2026

I think this PR still has two merge blockers.

  1. It breaks backward compatibility for some custom streamed Model implementations. run_single_turn_streamed() now ends up calling model.stream_response(**stream_request_kwargs). I reproduced a runtime regression with a custom model whose first required parameters are positional-only: the same model works on the base commit (7dc7fa2b) but fails on this PR with: TypeError: ... got some positional-only arguments passed as keyword arguments The new regression test only covers legacy models that omit response_span but still accept the existing keyword names, so it does not catch this case.

  2. Exporting ResponseSpanData.metadata appears to break live tracing export. On this PR head, a simple traced Runner.run(...) / Runner.run_streamed(...) against FakeModel completes, but the tracing client logs: 400 Unknown parameter: 'data[1].span_data.metadata' I could reproduce that on the PR head but not on the base commit. ResponseSpanData.export() now includes metadata, and the OpenAI tracing sanitizer does not remove it before sending the payload, so this looks like a real runtime incompatibility for the default tracing path.

Because both issues reproduce at runtime, I don't think this is ready to merge yet. Also, the CI checks are failing.

@seratch seratch marked this pull request as draft March 26, 2026 04:25
@smf-h
Copy link
Author

smf-h commented Mar 26, 2026

Pushed follow-up fixes for both blockers on the latest head:

  1. Restored positional invocation compatibility for both streamed and non-streamed custom models. Required parameters are positional again, while optional values still flow through kwargs and response_span remains conditional.
  2. Sanitized response span metadata before sending payloads to the OpenAI tracing ingest API, and added a regression test covering that path.

Local verification on this branch:

  • UV_FROZEN=1 uv sync --all-extras --all-packages --group dev.
  • uv run pytest tests/test_trace_processor.py tests/test_llm_hook_tracing.py tests/test_tracing.py tests/test_agent_tracing.py tests/test_tracing_errors.py tests/test_tracing_errors_streamed.py tests/mcp/test_mcp_tracing.py tests/test_openai_responses.py -q.
  • uv run pytest tests/test_agent_prompt.py tests/test_agent_runner_streamed.py tests/test_cancel_streaming.py tests/test_hitl_session_scenario.py tests/test_streaming_tool_call_arguments.py tests/test_run_state.py -q.
  • uv run ruff check.
  • uv run pyright --project pyrightconfig.json.
  • uv run mypy . --exclude site still reports the pre-existing examples/tools/image_generator.py:24 unused-ignore baseline.

The new Tests workflow run for this head is currently waiting for maintainer approval before the matrix starts: https://github.com/openai/openai-agents-python/actions/runs/23578395345 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Customizing traces and spans: Ability to add metadata to spans

2 participants