Support OpenAI Responses API instrumentation by sipercai · Pull Request #210 · alibaba/loongsuite-python

sipercai · 2026-06-02T11:55:14Z

Description

This PR adds latest-experimental OpenAI Responses API instrumentation for OpenAI.responses.create and AsyncOpenAI.responses.create.

The new instrumentation records request and response metadata including token usage, response status, service tier, reasoning request details, cached input tokens, reasoning output tokens, tool definitions, and message content when content capture is enabled. It supports non-streaming calls, stream=True, raw .parse() responses, sync and async clients, and error handling while keeping older OpenAI SDKs compatible.

Follow-up updates also cover Responses helper paths: responses.stream(model=..., input=...), existing-response streaming through responses.stream(response_id=...), and responses.parse() / AsyncResponses.parse().

Fixes #209

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Validation Evidence

.tox/py311-test-instrumentation-openai-v2-latest/bin/python -m pytest instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_responses.py -q - 20 passed.
uvx tox -e py311-test-instrumentation-openai-v2-latest -- instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_responses.py -ra - 146 passed, 2 existing async stream close warnings.
uvx tox -e py311-test-instrumentation-openai-v2-oldest -- instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_responses.py -ra - 106 passed, 2 skipped, 40 existing/deprecation warnings.
uvx tox -e lint-instrumentation-openai-v2 - passed, pylint 10.00/10.
uvx tox -e precommit - passed.
git diff --check - passed.
Claude team review loop - /tmp/codex-claude-review/loongsuite-python-agent-a82236395d/run-20260603-161349, rounds r1-r4 completed; final P2 registry/constant follow-up deferred.
Weaver JSON live-check sample - /tmp/openai-responses-weaver-sample.json, report /tmp/openai-responses-weaver-report.json. The sample produced 4 mocked Responses spans. Weaver ran successfully but reported that the local registry does not yet define gen_ai.openai.response.status, gen_ai.openai.request.previous_response_id, or gen_ai.usage.output_tokens_details.reasoning_tokens; this is a registry/schema follow-up rather than an instrumentation runtime failure.

Note: check_loongsuite_pr_readiness.py --repo . is not applicable for this upstream-style instrumentation-genai/opentelemetry-instrumentation-openai-v2 change. The checker currently rejects non-instrumentation-loongsuite/ plugin paths as forbidden for new LoongSuite plugin PRs.

Does This PR Require a Core Repo Change?

Yes. - Link to PR:
No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

Followed the style guidelines of this project
Changelogs have been updated
Unit tests have been added
Documentation has been updated

sipercai

Thanks for adding Responses API instrumentation. I verified the PR on the head commit and the core responses.create paths are working. I also found several official OpenAI Responses SDK helper surfaces from #209 that still need to be fixed or explicitly scoped before this can be considered complete.

What I verified locally:

PR head: 46e58a70 (feat/openai-responses-api).
SDK surface checked with openai==1.109.1: Responses / AsyncResponses expose create, stream, retrieve, parse, input_items, cancel, and delete.
Focused checks passed:
- uvx tox -e py311-test-instrumentation-openai-v2-latest -- instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_responses.py -ra: 132 passed.
- uvx tox -e py311-test-instrumentation-openai-v2-oldest -- instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_responses.py -ra: 106 passed, 2 skipped.
- uvx tox -e lint-instrumentation-openai-v2: passed.
- uvx tox -e precommit: passed.
Live smoke against an OpenAI-compatible provider:
- client.responses.create(...): request succeeded and produced 1 GenAI span with request/response model and token attributes.
- client.responses.create(..., stream=True): request succeeded and produced 1 GenAI span with request/response model and token attributes.
- client.responses.stream(model=..., input=...): request succeeded and produced 1 GenAI span, but emitted invalid OpenTelemetry attribute warnings for openai.Omit sentinel values.
- client.responses.stream(response_id=...): request succeeded, but instrumentation produced 0 GenAI spans.
- client.responses.parse(...): request succeeded and returned a parsed object, but instrumentation produced 0 GenAI spans.

Findings to address:

responses.stream(model=..., input=...) passes OpenAI SDK Omit sentinels into the new telemetry mapping.

The SDK helper delegates to self.create(..., stream=True) and forwards omitted optional parameters as openai.Omit. The PR's value_is_set() only filters openai.NotGiven, so the new Responses mapping treats Omit as a real value. In live smoke this produced invalid OpenTelemetry attribute warnings for fields such as gen_ai.request.temperature, gen_ai.request.top_p, gen_ai.openai.request.previous_response_id, gen_ai.openai.request.background, gen_ai.openai.request.store, and gen_ai.openai.request.parallel_tool_calls.

Please treat openai.Omit the same as NotGiven at value_is_set() or at the Responses mapping boundary, and add a test for client.responses.stream(model=..., input=...) that asserts no invalid attributes/warnings are emitted.
responses.stream(response_id=...) / async existing-response streaming is not instrumented.

The OpenAI SDK's responses.stream(response_id=..., starting_after=...) helper uses retrieve(stream=True), not create(stream=True). This PR only wraps Responses.create and AsyncResponses.create, so the existing-response stream helper is outside the current instrumentation. A live smoke request succeeded but produced 0 GenAI spans.

Please either support this path, for example by wrapping Responses.retrieve / AsyncResponses.retrieve or by explicitly handling the stream helper, or document why existing-response streaming is intentionally out of scope for #209. If it is in scope, add sync and async tests that demonstrate span count goes from 0 to 1 and preserve token/model attributes.
responses.parse() / AsyncResponses.parse() structured-output helpers are not instrumented or scoped.

The current OpenAI SDK exposes responses.parse() as a structured-output helper, and it does not call the wrapped create() method; it has its own POST/parser path. In live smoke, client.responses.parse(...) succeeded and returned a parsed object, but instrumentation produced 0 GenAI spans. OpenLLMetry also treats Responses.parse as a separate wrapper surface, which is a useful signal that this is not just an alias of create().

Please either instrument Responses.parse / AsyncResponses.parse, or explicitly document that structured-output helpers are deferred to a follow-up issue. Given #209 is about supporting the newer OpenAI Responses SDK surface, I would prefer covering it in this PR or at least making the scope boundary explicit.
The test matrix is still too narrow for claiming broad Responses API support.

The new tests cover direct responses.create, async create, direct stream=True, raw response, status mapping, errors, NO_CONTENT, and a function-tool output. They do not cover the official SDK helper surfaces above, and they do not yet cover built-in Responses tools, multimodal input, previous_response_id / conversation state, background/cancel behavior, async helper parity, concurrency isolation, or SPAN_AND_EVENT content mode.

Please add at least targeted tests for the helper paths above, and either add or explicitly defer the broader Responses API matrix items. OpenInference has examples/conformance coverage for multimodal, async stream, function calling, file search, web search, and structured outputs; OpenLLMetry covers additional wrapper surfaces such as retrieve and parse. Those should be used only as reference signals, not copied as a schema.

CI is green and the core responses.create implementation is promising, but I do not think this PR fully resolves #209 until the helper-path instrumentation gaps and Omit sentinel handling are fixed or explicitly scoped out.

sipercai · 2026-06-03T11:48:24Z

Updated this PR to address the Responses API helper-path review feedback:

Treat openai.Omit the same as NotGiven, preventing responses.stream(model=..., input=...) from exporting Omit sentinels as invalid attributes.
Instrument Responses.parse / AsyncResponses.parse.
Instrument existing-response streaming through Responses.retrieve(stream=True) / AsyncResponses.retrieve(stream=True), while keeping non-streaming retrieve as a no-op for GenAI spans.
Added focused sync/async tests for the SDK stream helper, parse helpers, existing-response streaming, and non-streaming retrieve no-op behavior.

Validation:

uvx tox -e py311-test-instrumentation-openai-v2-latest -- instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_responses.py -ra: 146 passed, 2 existing warnings.
uvx tox -e py311-test-instrumentation-openai-v2-oldest -- instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_responses.py -ra: 106 passed, 2 skipped, 40 existing/deprecation warnings.
uvx tox -e lint-instrumentation-openai-v2: passed, pylint 10.00/10.
uvx tox -e precommit: passed.
git diff --check: passed.
Claude review loop: /tmp/codex-claude-review/loongsuite-python-agent-a82236395d/run-20260603-161349, rounds r1-r4 completed; final P2 registry/constant follow-up deferred.
Weaver JSON live-check sample: /tmp/openai-responses-weaver-sample.json, 4 mocked Responses spans. Weaver ran but the local registry does not yet define gen_ai.openai.response.status, gen_ai.openai.request.previous_response_id, or gen_ai.usage.output_tokens_details.reasoning_tokens, so the telemetry contract has a registry follow-up rather than an instrumentation runtime failure.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Support OpenAI Responses API instrumentation

46e58a7

sipercai commented Jun 3, 2026

View reviewed changes

fix: cover OpenAI Responses helper instrumentation

eb58e0e

ralf0131 requested a review from Copilot June 10, 2026 07:45

Copilot started reviewing on behalf of ralf0131 June 10, 2026 07:45 View session

Copilot AI reviewed Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support OpenAI Responses API instrumentation#210

Support OpenAI Responses API instrumentation#210
sipercai wants to merge 2 commits into
mainfrom
feat/openai-responses-api

sipercai commented Jun 2, 2026 •

edited

Loading

Uh oh!

sipercai left a comment

Uh oh!

sipercai commented Jun 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sipercai commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Validation Evidence

Does This PR Require a Core Repo Change?

Checklist:

Uh oh!

sipercai left a comment

Choose a reason for hiding this comment

Uh oh!

sipercai commented Jun 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sipercai commented Jun 2, 2026 •

edited

Loading