Support OpenAI Responses API instrumentation#210
Conversation
sipercai
left a comment
There was a problem hiding this comment.
Thanks for adding Responses API instrumentation. I verified the PR on the head commit and the core responses.create paths are working. I also found several official OpenAI Responses SDK helper surfaces from #209 that still need to be fixed or explicitly scoped before this can be considered complete.
What I verified locally:
- PR head:
46e58a70(feat/openai-responses-api). - SDK surface checked with
openai==1.109.1:Responses/AsyncResponsesexposecreate,stream,retrieve,parse,input_items,cancel, anddelete. - Focused checks passed:
uvx tox -e py311-test-instrumentation-openai-v2-latest -- instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_responses.py -ra: 132 passed.uvx tox -e py311-test-instrumentation-openai-v2-oldest -- instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_responses.py -ra: 106 passed, 2 skipped.uvx tox -e lint-instrumentation-openai-v2: passed.uvx tox -e precommit: passed.
- Live smoke against an OpenAI-compatible provider:
client.responses.create(...): request succeeded and produced 1 GenAI span with request/response model and token attributes.client.responses.create(..., stream=True): request succeeded and produced 1 GenAI span with request/response model and token attributes.client.responses.stream(model=..., input=...): request succeeded and produced 1 GenAI span, but emitted invalid OpenTelemetry attribute warnings foropenai.Omitsentinel values.client.responses.stream(response_id=...): request succeeded, but instrumentation produced 0 GenAI spans.client.responses.parse(...): request succeeded and returned a parsed object, but instrumentation produced 0 GenAI spans.
Findings to address:
-
responses.stream(model=..., input=...)passes OpenAI SDKOmitsentinels into the new telemetry mapping.The SDK helper delegates to
self.create(..., stream=True)and forwards omitted optional parameters asopenai.Omit. The PR'svalue_is_set()only filtersopenai.NotGiven, so the new Responses mapping treatsOmitas a real value. In live smoke this produced invalid OpenTelemetry attribute warnings for fields such asgen_ai.request.temperature,gen_ai.request.top_p,gen_ai.openai.request.previous_response_id,gen_ai.openai.request.background,gen_ai.openai.request.store, andgen_ai.openai.request.parallel_tool_calls.Please treat
openai.Omitthe same asNotGivenatvalue_is_set()or at the Responses mapping boundary, and add a test forclient.responses.stream(model=..., input=...)that asserts no invalid attributes/warnings are emitted. -
responses.stream(response_id=...)/ async existing-response streaming is not instrumented.The OpenAI SDK's
responses.stream(response_id=..., starting_after=...)helper usesretrieve(stream=True), notcreate(stream=True). This PR only wrapsResponses.createandAsyncResponses.create, so the existing-response stream helper is outside the current instrumentation. A live smoke request succeeded but produced 0 GenAI spans.Please either support this path, for example by wrapping
Responses.retrieve/AsyncResponses.retrieveor by explicitly handling the stream helper, or document why existing-response streaming is intentionally out of scope for #209. If it is in scope, add sync and async tests that demonstrate span count goes from 0 to 1 and preserve token/model attributes. -
responses.parse()/AsyncResponses.parse()structured-output helpers are not instrumented or scoped.The current OpenAI SDK exposes
responses.parse()as a structured-output helper, and it does not call the wrappedcreate()method; it has its own POST/parser path. In live smoke,client.responses.parse(...)succeeded and returned a parsed object, but instrumentation produced 0 GenAI spans. OpenLLMetry also treatsResponses.parseas a separate wrapper surface, which is a useful signal that this is not just an alias ofcreate().Please either instrument
Responses.parse/AsyncResponses.parse, or explicitly document that structured-output helpers are deferred to a follow-up issue. Given #209 is about supporting the newer OpenAI Responses SDK surface, I would prefer covering it in this PR or at least making the scope boundary explicit. -
The test matrix is still too narrow for claiming broad Responses API support.
The new tests cover direct
responses.create, async create, directstream=True, raw response, status mapping, errors, NO_CONTENT, and a function-tool output. They do not cover the official SDK helper surfaces above, and they do not yet cover built-in Responses tools, multimodal input,previous_response_id/ conversation state, background/cancel behavior, async helper parity, concurrency isolation, or SPAN_AND_EVENT content mode.Please add at least targeted tests for the helper paths above, and either add or explicitly defer the broader Responses API matrix items. OpenInference has examples/conformance coverage for multimodal, async stream, function calling, file search, web search, and structured outputs; OpenLLMetry covers additional wrapper surfaces such as
retrieveandparse. Those should be used only as reference signals, not copied as a schema.
CI is green and the core responses.create implementation is promising, but I do not think this PR fully resolves #209 until the helper-path instrumentation gaps and Omit sentinel handling are fixed or explicitly scoped out.
|
Updated this PR to address the Responses API helper-path review feedback:
Validation:
|
Description
This PR adds latest-experimental OpenAI Responses API instrumentation for
OpenAI.responses.createandAsyncOpenAI.responses.create.The new instrumentation records request and response metadata including token usage, response status, service tier, reasoning request details, cached input tokens, reasoning output tokens, tool definitions, and message content when content capture is enabled. It supports non-streaming calls,
stream=True, raw.parse()responses, sync and async clients, and error handling while keeping older OpenAI SDKs compatible.Follow-up updates also cover Responses helper paths:
responses.stream(model=..., input=...), existing-response streaming throughresponses.stream(response_id=...), andresponses.parse()/AsyncResponses.parse().Fixes #209
Type of change
Validation Evidence
.tox/py311-test-instrumentation-openai-v2-latest/bin/python -m pytest instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_responses.py -q- 20 passed.uvx tox -e py311-test-instrumentation-openai-v2-latest -- instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_responses.py -ra- 146 passed, 2 existing async stream close warnings.uvx tox -e py311-test-instrumentation-openai-v2-oldest -- instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_responses.py -ra- 106 passed, 2 skipped, 40 existing/deprecation warnings.uvx tox -e lint-instrumentation-openai-v2- passed, pylint 10.00/10.uvx tox -e precommit- passed.git diff --check- passed./tmp/codex-claude-review/loongsuite-python-agent-a82236395d/run-20260603-161349, rounds r1-r4 completed; final P2 registry/constant follow-up deferred./tmp/openai-responses-weaver-sample.json, report/tmp/openai-responses-weaver-report.json. The sample produced 4 mocked Responses spans. Weaver ran successfully but reported that the local registry does not yet definegen_ai.openai.response.status,gen_ai.openai.request.previous_response_id, orgen_ai.usage.output_tokens_details.reasoning_tokens; this is a registry/schema follow-up rather than an instrumentation runtime failure.Note:
check_loongsuite_pr_readiness.py --repo .is not applicable for this upstream-styleinstrumentation-genai/opentelemetry-instrumentation-openai-v2change. The checker currently rejects non-instrumentation-loongsuite/plugin paths as forbidden for new LoongSuite plugin PRs.Does This PR Require a Core Repo Change?
Checklist:
See contributing.md for styleguide, changelog guidelines, and more.