Skip to content

Responses API fixes to comply with Microsoft Foundry Hosted Agents #1573

Open
antonslutskyms wants to merge 13 commits intoNVIDIA:developfrom
bbednarski9:as/responses-api-frontend-fix-ext
Open

Responses API fixes to comply with Microsoft Foundry Hosted Agents #1573
antonslutskyms wants to merge 13 commits intoNVIDIA:developfrom
bbednarski9:as/responses-api-frontend-fix-ext

Conversation

@antonslutskyms
Copy link

@antonslutskyms antonslutskyms commented Feb 6, 2026

Description

Closes

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

The following items are addressed in this PR:

  1. Making "model" input field Optional
  2. Adding support for "type" : "input_text" e.g.: {"type": "message", "role": "user", "content": [{"type": "input_text", "text": "hello"}]}
  3. Updating message ID and response ID to comply with length requirement of 50 characters.

Summary by CodeRabbit

  • New Features

    • Added OpenAI Responses API support with automatic format detection and an explicit openai_api_v1_format override; supports streaming (SSE) and non-streaming endpoints and OpenAI-compatible response shaping.
    • Improved input compatibility with a new text-input attachment type and enhanced tool-handling for Responses-style requests.
  • Documentation

    • Added a README and example config demonstrating setup, endpoints, streaming/non-streaming usage, instructions, and test curl commands.
  • Tests

    • Expanded test coverage for Responses models, conversions, routing/path detection, streaming behavior, and tool handling.

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
@antonslutskyms antonslutskyms requested a review from a team as a code owner February 6, 2026 03:31
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 6, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link

coderabbitai bot commented Feb 6, 2026

Walkthrough

Adds OpenAI Responses API support to the FastAPI front end: new Responses API data models and converters; endpoint format detection/config (auto, chat_completions, responses); /v1/responses route with SSE streaming; Responses↔Chat conversion (including tool-format translation); InputTextContent handling; example config/docs; and extensive tests.

Changes

Cohort / File(s) Summary
Documentation & Example Config
examples/front_ends/responses_api_endpoint/README.md, examples/front_ends/responses_api_endpoint/configs/config.yml
New README and example YAML demonstrating Responses API usage, mappings (/generate, /chat, /v1/responses), startup/test commands, streaming examples, and openai_api_v1_format configuration.
API Data Models
src/nat/data_models/api_server.py
Adds ChatContentType.INPUT_TEXT, InputTextContent, extends UserContent union, Message content serializer, new Responses API models (ResponsesInputItem, ResponsesRequest, ResponsesOutputContent, ResponsesOutputItem, ResponsesUsage, ResponsesAPIResponse), conversion helpers between Responses and Chat models, tool-format translation, and expanded AIQ compatibility aliases.
FastAPI Front-end Config
src/nat/front_ends/fastapi/fastapi_front_end_config.py
Introduces openai_api_v1_format: Literal["auto","chat_completions","responses"] on EndpointBase and sets default to "auto" for the default endpoint.
FastAPI Route Handling
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py
Adds _determine_api_format to choose between chat_completions and responses, creates routes accordingly, and implements post_responses_api_compatible_endpoint with streaming and non-streaming handling and conversions.
Message Handling
src/nat/front_ends/fastapi/message_handler.py
Converts InputTextContent attachments to TextContent when extracting the last user message.
Streaming Response Helpers
src/nat/front_ends/fastapi/response_helpers.py
Adds generate_responses_api_streaming to emit OpenAI Responses API SSE events (response.created, output_item/content events, text deltas, completion, and error events) and handle streaming/non-streaming outputs and errors.
Tests
tests/nat/front_ends/fastapi/test_openai_compatibility.py
Large new test suite covering ResponsesRequest parsing/conversion, ResponsesAPIResponse generation, endpoint streaming/non-streaming, path-format detection and explicit overrides, and tool conversion/filtering (many parameterized cases).

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant FastAPI as FastAPI Front-End
    participant Detector as Format Detector
    participant Converter as Responses→Chat Converter
    participant Workflow as NeMo Workflow
    participant Builder as Responses Builder

    Client->>FastAPI: POST /v1/responses (ResponsesRequest)
    FastAPI->>Detector: _determine_api_format(endpoint, path)
    Detector-->>FastAPI: "responses"
    FastAPI->>Converter: to_chat_request(ResponsesRequest)
    Converter-->>FastAPI: ChatRequest
    FastAPI->>Workflow: execute(ChatRequest)
    Workflow-->>FastAPI: ChatResponse / stream chunks
    FastAPI->>Builder: ResponsesAPIResponse.from_chat_response / stream events
    Builder-->>FastAPI: SSE events or JSON payload
    FastAPI-->>Client: SSE stream or JSON response

    rect rgba(100,150,200,0.5)
    Note over FastAPI,Builder: Streaming emits: response.created → output_item.added → content_part.added → output_text.delta → ... → response.done
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: introducing Responses API support to comply with Microsoft Foundry Hosted Agents requirements, which is the primary focus of the changeset across multiple files.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🤖 Fix all issues with AI agents
In `@examples/front_ends/responses_api_endpoint/README.md`:
- Around line 197-201: Update the README Limitations section to replace the
acronym "NAT" with the full name "NeMo Agent Toolkit" where it appears
(specifically the bullet "- **No Built-in Tools**: OpenAI built-in tools like
`code_interpreter` are not executed by NAT; use the `responses_api_agent`
workflow type for that functionality"), ensuring the sentence now reads that
those tools are not executed by the NeMo Agent Toolkit and preserving the rest
of the text and formatting.
- Around line 118-133: Update the fenced code block that shows the SSE events by
adding the language specifier "text" after the opening backticks (i.e., change
``` to ```text) so the block is explicitly marked as text; locate the SSE
example block containing lines like "event: response.created" and "data:
{\"type\": \"response.created\", ...}" and apply the change to the opening
fence.
- Line 22: Replace any occurrences of the lowercase/abbreviated name with the
correct official name: change "NeMo Agent toolkit" and any "NAT" usages in the
README to "NeMo Agent Toolkit" (also update the occurrence referenced at the
second mention currently on the file). Ensure capitalization exactly matches
"NeMo Agent Toolkit" everywhere in the document.

In `@src/nat/data_models/api_server.py`:
- Around line 1104-1109: The usage conversion has an operator-precedence bug:
guard accesses to chat_response.usage with the conditional by parenthesizing
each expression; change the ResponsesUsage construction so each field uses the
form (chat_response.usage.prompt_tokens or 0) if chat_response.usage else 0,
(chat_response.usage.completion_tokens or 0) if chat_response.usage else 0, and
(chat_response.usage.total_tokens or 0) if chat_response.usage else 0 to ensure
attribute access is only attempted when chat_response.usage is truthy.

In `@src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py`:
- Around line 950-957: The code in post_responses_api_compatible uses
getattr(payload, 'model', 'unknown-model') but payload.model exists and can be
None, so the default never applies; change how model_name is derived to ensure a
fallback string when payload.model is None (e.g., use payload.model or
'unknown-model' or getattr(payload, 'model') or 'unknown-model') so that the
value passed into generate_responses_api_streaming and
ResponsesAPIResponse.from_chat_response is never None.

In `@src/nat/front_ends/fastapi/response_helpers.py`:
- Around line 199-205: The function generate_responses_api_streaming currently
declares an unused parameter step_adaptor with a mutable default
StepAdaptor(StepAdaptorConfig()), which triggers ARG001 and B008; remove the
step_adaptor parameter from the signature and any related imports/usages, or if
you intend to keep it for future use change it to an optional parameter (e.g.,
step_adaptor: StepAdaptor | None = None) without constructing a default instance
in the signature and add a short comment documenting planned usage; update any
callers accordingly and remove StepAdaptorConfig() construction from the
function definition.

In `@tests/nat/front_ends/fastapi/test_openai_compatibility.py`:
- Around line 582-605: Add three tests to cover the PR changes: (1)
test_responses_request_optional_model — instantiate ResponsesRequest without a
model (ResponsesRequest(input="...")), assert request.model is None,
request.input preserved, and that request.to_chat_request() produces a
ChatRequest whose first message content equals the original input; (2)
test_responses_request_input_text_content — create a ResponsesRequest with input
as a message whose content is [{"type":"input_text","text":"hello"}], call
to_chat_request() and assert the resulting ChatRequest has one message with role
"user" and the text preserved/converted into the message content; (3)
test_response_id_length_compliance — create a ChatResponse via
ChatResponse.from_string(..., usage=Usage(...)) and convert to
ResponsesAPIResponse with ResponsesAPIResponse.from_chat_response(...,
model="test"), then assert len(chat_response.id) <= 50 and
len(responses_api_response.id) <= 50. Use the existing imports from
nat.data_models.api_server and reference ResponsesRequest.to_chat_request,
ResponsesAPIResponse.from_chat_response, ChatResponse.from_string, and Usage.
- Around line 855-857: Update the incorrect comment on the assertion in
test_openai_compatibility (the assertion that checks response.status_code ==
422) to reflect that only the 'input' field is missing now (since 'model' is
optional in this PR); locate the assertion using the symbol response.status_code
and replace the comment "# Validation error - missing 'input' and 'model'" with
a concise comment like "# Validation error - missing 'input'".
- Around line 669-679: The test test_responses_api_response_from_chat_response
constructs a Usage with only prompt_tokens and completion_tokens; update the
Usage initialization (used with ChatResponse.from_string) to include
total_tokens=15 for consistency with other tests (e.g., the Usage on line 156)
so the call becomes Usage(prompt_tokens=10, completion_tokens=5,
total_tokens=15) before creating ResponsesAPIResponse.from_chat_response; no
other changes needed to ChatResponse or ResponsesAPIResponse.
🧹 Nitpick comments (5)
src/nat/front_ends/fastapi/response_helpers.py (1)

226-233: Move standard-library and local imports to module level.

json, logging, and uuid are imported inside the function body. ChatResponse and ChatResponseChunk are already used elsewhere in this module (referenced in relevant snippets). Hoisting them to the top removes per-call import overhead and aligns with PEP 8 conventions.

Also, _logger on Line 233 shadows the pattern used elsewhere in the codebase (module-level logger). Consider reusing a module-level logger instead of creating one per call.

src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (1)

321-358: Move import re to module level.

re is imported inside _determine_api_format on each call. As a standard-library module with no side effects, it should be at the top of the file per PEP 8. Also, the regex pattern at Line 351 is recompiled every call — consider compiling it once at module level.

Suggested fix (module-level constant)
# At module level:
import re

_RESPONSES_PATH_PATTERN = re.compile(r'^(/api)?(/v\d+)?/responses$')
-        import re
-
         # ...
-        responses_pattern = re.compile(r'^(/api)?(/v\d+)?/responses$')
-
-        if responses_pattern.match(path_lower):
+        if _RESPONSES_PATH_PATTERN.match(path_lower):
src/nat/data_models/api_server.py (1)

998-1000: Move logging import to module level.

logging is imported inside _convert_tools and a new logger is created on each call. The module already uses import logging-style patterns elsewhere (e.g., the logging import isn't at the top of this file's visible portion, but _logger should be a module-level singleton).

tests/nat/front_ends/fastapi/test_openai_compatibility.py (2)

582-584: Consolidate inline imports to module level for consistency.

ResponsesRequest, ResponsesAPIResponse, ResponsesOutputContent, ResponsesOutputItem, and FastApiFrontEndPluginWorker are repeatedly imported inside individual test functions. Meanwhile, equivalent models (ChatRequest, ChatResponse, etc.) and Config/GeneralConfig/FastApiFrontEndConfig are already at module level. Moving these inline imports to the top would reduce duplication and align with PEP 8.

Additionally, Config and GeneralConfig are re-imported inline at lines 1049 and 1071 despite already being module-level imports.


682-693: Extract repeated config setup into pytest fixtures.

At least 7 async test functions repeat a near-identical pattern: create FastApiFrontEndConfig, set openai_api_v1_path, build Config with EchoFunctionConfig/StreamingEchoFunctionConfig, then build_nat_client. Extracting this into fixtures would reduce duplication significantly.

Example fixture
`@pytest.fixture`(name="responses_api_echo_client")
async def fixture_responses_api_echo_client():
    front_end_config = FastApiFrontEndConfig()
    front_end_config.workflow.openai_api_v1_path = "/v1/responses"
    front_end_config.workflow.openai_api_path = "/chat"

    config = Config(
        general=GeneralConfig(front_end=front_end_config),
        workflow=EchoFunctionConfig(use_openai_api=True),
    )

    async with build_nat_client(config) as client:
        yield client

As per coding guidelines, Any frequently repeated code should be extracted into pytest fixtures.


**Complexity:** 🟢 Beginner

This example demonstrates how to configure a NeMo Agent toolkit FastAPI frontend to accept requests in the [OpenAI Responses API format](https://platform.openai.com/docs/api-reference/responses).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Spell out "NeMo Agent Toolkit" with correct capitalization.

Per coding guidelines, documentation in Markdown files should not use NAT as an acronym and should always spell out "NeMo Agent Toolkit." Also, "toolkit" should be capitalized to "Toolkit" to match the official name. This also applies to Line 37.

Suggested fix
-This example demonstrates how to configure a NeMo Agent toolkit FastAPI frontend to accept requests in the [OpenAI Responses API format](https://platform.openai.com/docs/api-reference/responses).
+This example demonstrates how to configure a NeMo Agent Toolkit FastAPI frontend to accept requests in the [OpenAI Responses API format](https://platform.openai.com/docs/api-reference/responses).

As per coding guidelines: "Documentation in Markdown files should not use NAT as an acronym, always spell out NeMo Agent Toolkit."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
This example demonstrates how to configure a NeMo Agent toolkit FastAPI frontend to accept requests in the [OpenAI Responses API format](https://platform.openai.com/docs/api-reference/responses).
This example demonstrates how to configure a NeMo Agent Toolkit FastAPI frontend to accept requests in the [OpenAI Responses API format](https://platform.openai.com/docs/api-reference/responses).
🤖 Prompt for AI Agents
In `@examples/front_ends/responses_api_endpoint/README.md` at line 22, Replace any
occurrences of the lowercase/abbreviated name with the correct official name:
change "NeMo Agent toolkit" and any "NAT" usages in the README to "NeMo Agent
Toolkit" (also update the occurrence referenced at the second mention currently
on the file). Ensure capitalization exactly matches "NeMo Agent Toolkit"
everywhere in the document.

Comment on lines +118 to +133
```
event: response.created
data: {"type": "response.created", "response": {"id": "resp_...", "status": "in_progress"}}

event: response.output_item.added
data: {"type": "response.output_item.added", ...}

event: response.output_text.delta
data: {"type": "response.output_text.delta", "delta": "The current"}

event: response.output_text.delta
data: {"type": "response.output_text.delta", "delta": " time is..."}

event: response.done
data: {"type": "response.done", "response": {"status": "completed", ...}}
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language specifier to the fenced code block.

The SSE events example block is missing a language identifier. Since this is a text-based event stream format, use text as the language.

Suggested fix
-```
+```text
 event: response.created
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
event: response.created
data: {"type": "response.created", "response": {"id": "resp_...", "status": "in_progress"}}
event: response.output_item.added
data: {"type": "response.output_item.added", ...}
event: response.output_text.delta
data: {"type": "response.output_text.delta", "delta": "The current"}
event: response.output_text.delta
data: {"type": "response.output_text.delta", "delta": " time is..."}
event: response.done
data: {"type": "response.done", "response": {"status": "completed", ...}}
```
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)

[warning] 118-118: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In `@examples/front_ends/responses_api_endpoint/README.md` around lines 118 - 133,
Update the fenced code block that shows the SSE events by adding the language
specifier "text" after the opening backticks (i.e., change ``` to ```text) so
the block is explicitly marked as text; locate the SSE example block containing
lines like "event: response.created" and "data: {\"type\": \"response.created\",
...}" and apply the change to the opening fence.

Comment on lines +197 to +201
## Limitations

- **No Stateful Backend**: `previous_response_id` is accepted but ignored
- **No Built-in Tools**: OpenAI built-in tools like `code_interpreter` are not executed by NAT; use the `responses_api_agent` workflow type for that functionality
- **Tool Format Conversion**: Responses API tool definitions are converted to Chat Completions format internally
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace "NAT" acronym with "NeMo Agent Toolkit" in documentation.

Lines 200 uses "NAT" as an acronym. The coding guidelines require spelling out "NeMo Agent Toolkit" in Markdown documentation.

Suggested fix
-- **No Built-in Tools**: OpenAI built-in tools like `code_interpreter` are not executed by NAT; use the `responses_api_agent` workflow type for that functionality
+- **No Built-in Tools**: OpenAI built-in tools like `code_interpreter` are not executed by NeMo Agent Toolkit; use the `responses_api_agent` workflow type for that functionality

As per coding guidelines: "Documentation in Markdown files should not use NAT as an acronym, always spell out NeMo Agent Toolkit."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## Limitations
- **No Stateful Backend**: `previous_response_id` is accepted but ignored
- **No Built-in Tools**: OpenAI built-in tools like `code_interpreter` are not executed by NAT; use the `responses_api_agent` workflow type for that functionality
- **Tool Format Conversion**: Responses API tool definitions are converted to Chat Completions format internally
## Limitations
- **No Stateful Backend**: `previous_response_id` is accepted but ignored
- **No Built-in Tools**: OpenAI built-in tools like `code_interpreter` are not executed by NeMo Agent Toolkit; use the `responses_api_agent` workflow type for that functionality
- **Tool Format Conversion**: Responses API tool definitions are converted to Chat Completions format internally
🤖 Prompt for AI Agents
In `@examples/front_ends/responses_api_endpoint/README.md` around lines 197 - 201,
Update the README Limitations section to replace the acronym "NAT" with the full
name "NeMo Agent Toolkit" where it appears (specifically the bullet "- **No
Built-in Tools**: OpenAI built-in tools like `code_interpreter` are not executed
by NAT; use the `responses_api_agent` workflow type for that functionality"),
ensuring the sentence now reads that those tools are not executed by the NeMo
Agent Toolkit and preserving the rest of the text and formatting.

Comment on lines +1104 to +1109
# Convert usage if available
usage = ResponsesUsage(
input_tokens=chat_response.usage.prompt_tokens or 0 if chat_response.usage else 0,
output_tokens=chat_response.usage.completion_tokens or 0 if chat_response.usage else 0,
total_tokens=chat_response.usage.total_tokens or 0 if chat_response.usage else 0,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Operator precedence bug in usage conversion — add parentheses.

Due to Python's precedence rules, A or B if C else D is parsed as A or (B if C else D), not (A or B) if C else D. This means chat_response.usage.prompt_tokens is evaluated unconditionally — if chat_response.usage were ever None, this would raise AttributeError before the guard kicks in.

ChatResponse.usage is currently required (non-optional), so this won't crash today, but the intent is clearly to guard the attribute access with the conditional. Adding parentheses makes the logic correct and future-proof.

Suggested fix
         usage = ResponsesUsage(
-            input_tokens=chat_response.usage.prompt_tokens or 0 if chat_response.usage else 0,
-            output_tokens=chat_response.usage.completion_tokens or 0 if chat_response.usage else 0,
-            total_tokens=chat_response.usage.total_tokens or 0 if chat_response.usage else 0,
+            input_tokens=(chat_response.usage.prompt_tokens or 0) if chat_response.usage else 0,
+            output_tokens=(chat_response.usage.completion_tokens or 0) if chat_response.usage else 0,
+            total_tokens=(chat_response.usage.total_tokens or 0) if chat_response.usage else 0,
         )
🤖 Prompt for AI Agents
In `@src/nat/data_models/api_server.py` around lines 1104 - 1109, The usage
conversion has an operator-precedence bug: guard accesses to chat_response.usage
with the conditional by parenthesizing each expression; change the
ResponsesUsage construction so each field uses the form
(chat_response.usage.prompt_tokens or 0) if chat_response.usage else 0,
(chat_response.usage.completion_tokens or 0) if chat_response.usage else 0, and
(chat_response.usage.total_tokens or 0) if chat_response.usage else 0 to ensure
attribute access is only attempted when chat_response.usage is truthy.

Comment on lines +950 to +957
from nat.data_models.api_server import ResponsesAPIResponse

async def post_responses_api_compatible(response: Response, request: Request, payload: request_type):
stream_requested = getattr(payload, 'stream', False)

# Convert Responses API request to internal ChatRequest format
chat_request = payload.to_chat_request()
model_name = getattr(payload, 'model', 'unknown-model')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: getattr(payload, 'model', 'unknown-model') returns None when model is None.

Since model is defined on ResponsesRequest as str | None = None, the attribute always exists, so getattr's default ('unknown-model') is never used. When the user omits model, getattr returns None, which then propagates as a None model name to generate_responses_api_streaming and ResponsesAPIResponse.from_chat_response.

Use or to provide the fallback:

Suggested fix
-                model_name = getattr(payload, 'model', 'unknown-model')
+                model_name = payload.model or "unknown-model"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from nat.data_models.api_server import ResponsesAPIResponse
async def post_responses_api_compatible(response: Response, request: Request, payload: request_type):
stream_requested = getattr(payload, 'stream', False)
# Convert Responses API request to internal ChatRequest format
chat_request = payload.to_chat_request()
model_name = getattr(payload, 'model', 'unknown-model')
from nat.data_models.api_server import ResponsesAPIResponse
async def post_responses_api_compatible(response: Response, request: Request, payload: request_type):
stream_requested = getattr(payload, 'stream', False)
# Convert Responses API request to internal ChatRequest format
chat_request = payload.to_chat_request()
model_name = payload.model or "unknown-model"
🤖 Prompt for AI Agents
In `@src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py` around lines
950 - 957, The code in post_responses_api_compatible uses getattr(payload,
'model', 'unknown-model') but payload.model exists and can be None, so the
default never applies; change how model_name is derived to ensure a fallback
string when payload.model is None (e.g., use payload.model or 'unknown-model' or
getattr(payload, 'model') or 'unknown-model') so that the value passed into
generate_responses_api_streaming and ResponsesAPIResponse.from_chat_response is
never None.

Comment on lines +199 to +205
async def generate_responses_api_streaming(
payload: typing.Any,
*,
session: Session,
model: str,
step_adaptor: StepAdaptor = StepAdaptor(StepAdaptorConfig()),
) -> AsyncGenerator[str, None]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove unused step_adaptor parameter; avoid mutable defaults in signature.

Ruff correctly flags that step_adaptor is never used in the function body (ARG001). Additionally, constructing StepAdaptor(StepAdaptorConfig()) in the default argument is flagged by B008 — mutable objects in defaults are evaluated once at definition time and shared across calls.

Remove the parameter entirely since it's unused. If it's intended for future use, accept None and document the plan.

Suggested fix
 async def generate_responses_api_streaming(
     payload: typing.Any,
     *,
     session: Session,
     model: str,
-    step_adaptor: StepAdaptor = StepAdaptor(StepAdaptorConfig()),
 ) -> AsyncGenerator[str, None]:
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 204-204: Unused function argument: step_adaptor

(ARG001)


[warning] 204-204: Do not perform function call StepAdaptor in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


[warning] 204-204: Do not perform function call StepAdaptorConfig in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

🤖 Prompt for AI Agents
In `@src/nat/front_ends/fastapi/response_helpers.py` around lines 199 - 205, The
function generate_responses_api_streaming currently declares an unused parameter
step_adaptor with a mutable default StepAdaptor(StepAdaptorConfig()), which
triggers ARG001 and B008; remove the step_adaptor parameter from the signature
and any related imports/usages, or if you intend to keep it for future use
change it to an optional parameter (e.g., step_adaptor: StepAdaptor | None =
None) without constructing a default instance in the signature and add a short
comment documenting planned usage; update any callers accordingly and remove
StepAdaptorConfig() construction from the function definition.

Comment on lines +582 to +605
def test_responses_request_model():
"""Test that ResponsesRequest model correctly parses Responses API format."""
from nat.data_models.api_server import ResponsesRequest

# Test with simple string input
request = ResponsesRequest(model="gpt-4o-mini", input="What is the weather?")
assert request.model == "gpt-4o-mini"
assert request.input == "What is the weather?"
assert request.stream is False

# Test with message-style input
request = ResponsesRequest(
model="gpt-4o-mini",
input=[{
"role": "user", "content": "Hello"
}],
instructions="You are a helpful assistant.",
tools=[{
"type": "function", "name": "get_weather", "description": "Get weather"
}],
)
assert isinstance(request.input, list)
assert request.instructions == "You are a helpful assistant."
assert request.tools is not None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing test coverage for key PR objectives: optional model and input_text content type.

The PR explicitly introduces: (1) making model optional, (2) supporting {"type": "input_text", "text": "..."} content items, and (3) 50-char ID length compliance. None of these are tested here or elsewhere in the file.

Suggested additional test cases
def test_responses_request_optional_model():
    """Test that ResponsesRequest accepts requests without a model field."""
    from nat.data_models.api_server import ResponsesRequest

    # model should be optional
    request = ResponsesRequest(input="Hello without model")
    assert request.model is None
    assert request.input == "Hello without model"

    # Conversion to ChatRequest should still work
    chat_request = request.to_chat_request()
    assert chat_request.messages[0].content == "Hello without model"


def test_responses_request_input_text_content():
    """Test that ResponsesRequest handles input_text content items (Azure Foundry compatibility)."""
    from nat.data_models.api_server import ResponsesRequest

    request = ResponsesRequest(
        model="gpt-4",
        input=[{
            "role": "user",
            "content": [{"type": "input_text", "text": "hello"}],
        }],
    )
    chat_request = request.to_chat_request()
    # Verify input_text content is correctly converted
    assert len(chat_request.messages) == 1
    assert chat_request.messages[0].role.value == "user"


def test_response_id_length_compliance():
    """Test that response and message IDs comply with 50-character length requirement."""
    from nat.data_models.api_server import ResponsesAPIResponse, ChatResponse, Usage

    chat_response = ChatResponse.from_string("Test", usage=Usage(prompt_tokens=1, completion_tokens=1))
    responses_api_response = ResponsesAPIResponse.from_chat_response(chat_response, model="test")

    assert len(responses_api_response.id) <= 50
    assert len(chat_response.id) <= 50

As per coding guidelines, Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code.

🤖 Prompt for AI Agents
In `@tests/nat/front_ends/fastapi/test_openai_compatibility.py` around lines 582 -
605, Add three tests to cover the PR changes: (1)
test_responses_request_optional_model — instantiate ResponsesRequest without a
model (ResponsesRequest(input="...")), assert request.model is None,
request.input preserved, and that request.to_chat_request() produces a
ChatRequest whose first message content equals the original input; (2)
test_responses_request_input_text_content — create a ResponsesRequest with input
as a message whose content is [{"type":"input_text","text":"hello"}], call
to_chat_request() and assert the resulting ChatRequest has one message with role
"user" and the text preserved/converted into the message content; (3)
test_response_id_length_compliance — create a ChatResponse via
ChatResponse.from_string(..., usage=Usage(...)) and convert to
ResponsesAPIResponse with ResponsesAPIResponse.from_chat_response(...,
model="test"), then assert len(chat_response.id) <= 50 and
len(responses_api_response.id) <= 50. Use the existing imports from
nat.data_models.api_server and reference ResponsesRequest.to_chat_request,
ResponsesAPIResponse.from_chat_response, ChatResponse.from_string, and Usage.

Comment on lines +669 to +679
def test_responses_api_response_from_chat_response():
"""Test conversion from ChatResponse to ResponsesAPIResponse."""
from nat.data_models.api_server import ResponsesAPIResponse

chat_response = ChatResponse.from_string("Test response", usage=Usage(prompt_tokens=10, completion_tokens=5))
responses_api_response = ResponsesAPIResponse.from_chat_response(chat_response, model="test-model")

assert responses_api_response.object == "response"
assert responses_api_response.model == "test-model"
assert len(responses_api_response.output) == 1
assert responses_api_response.output[0].content[0].text == "Test response"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find Usage model definition
rg -n "class Usage" --type=py -A 20 -g '!**/test/**'

Repository: NVIDIA/NeMo-Agent-Toolkit

Length of output: 4831


🏁 Script executed:

#!/bin/bash
# Check for validators or computed fields related to total_tokens
rg -n "total_tokens" --type=py -g '!**/test/**' -B 2 -A 2

Repository: NVIDIA/NeMo-Agent-Toolkit

Length of output: 50381


Add total_tokens to Usage initialization for consistency with line 156.

Line 673 creates Usage(prompt_tokens=10, completion_tokens=5) without total_tokens, while line 156 in the same file provides it: Usage(prompt_tokens=1, completion_tokens=1, total_tokens=2). Although total_tokens is optional (defaults to None), the established pattern in this test file and throughout the codebase is to always provide all three values explicitly. Provide total_tokens=15 to match line 156's approach and maintain consistency.

🤖 Prompt for AI Agents
In `@tests/nat/front_ends/fastapi/test_openai_compatibility.py` around lines 669 -
679, The test test_responses_api_response_from_chat_response constructs a Usage
with only prompt_tokens and completion_tokens; update the Usage initialization
(used with ChatResponse.from_string) to include total_tokens=15 for consistency
with other tests (e.g., the Usage on line 156) so the call becomes
Usage(prompt_tokens=10, completion_tokens=5, total_tokens=15) before creating
ResponsesAPIResponse.from_chat_response; no other changes needed to ChatResponse
or ResponsesAPIResponse.

Comment on lines +855 to +857
},
)
assert response.status_code == 422 # Validation error - missing 'input' and 'model'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incorrect comment: model is now optional per this PR.

The comment states # Validation error - missing 'input' and 'model', but this PR makes model optional. The 422 here is due to the missing input field only.

Suggested fix
-        assert response.status_code == 422  # Validation error - missing 'input' and 'model'
+        assert response.status_code == 422  # Validation error - missing 'input'
🤖 Prompt for AI Agents
In `@tests/nat/front_ends/fastapi/test_openai_compatibility.py` around lines 855 -
857, Update the incorrect comment on the assertion in test_openai_compatibility
(the assertion that checks response.status_code == 422) to reflect that only the
'input' field is missing now (since 'model' is optional in this PR); locate the
assertion using the symbol response.status_code and replace the comment "#
Validation error - missing 'input' and 'model'" with a concise comment like "#
Validation error - missing 'input'".

@bbednarski9 bbednarski9 added the DO NOT MERGE PR should not be merged; see PR for details label Feb 9, 2026
@bbednarski9
Copy link
Contributor

This PR does not yet address the issue of stateful agents. This API proxy would lead to some "false advertising". We do not plan to merge this until the above has been addressed

@antonslutskyms antonslutskyms force-pushed the as/responses-api-frontend-fix-ext branch 3 times, most recently from 310824e to 24a566b Compare February 10, 2026 14:22
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/nat/front_ends/fastapi/response_helpers.py`:
- Around line 393-396: The except block currently yields the raw exception
string to the client (yield _error_event(str(setup_error), "server_error")),
potentially leaking internal details; instead log the full exception with
_logger.exception("...") and send a sanitized message to the client (e.g., yield
_error_event("Internal server error", "server_error") or a mapped safe message),
ensuring you do not expose setup_error contents in the SSE payload while
preserving detailed logs locally; update the except block around _error_event
and keep _logger.exception as-is.
- Around line 235-236: response_id and message_id are being truncated to 55/56
chars and thus exceed the 50-char requirement; change them to use a single
uuid_module.uuid4().hex each and slice so that the total length including the
prefix stays <=50 (e.g. keep "resp_" prefix and take the first 45 hex chars for
response_id, keep "msg_" prefix and take the first 46 hex chars for message_id),
updating the assignments that set response_id and message_id (which currently
call uuid_module.uuid4().hex twice).
🧹 Nitpick comments (3)
src/nat/front_ends/fastapi/response_helpers.py (3)

226-233: Move imports and logger to module level.

json, logging, uuid, and the nat.data_models.api_server imports are standard/stable and should live at the top of the file. Recreating the logger on every call is a minor inefficiency (logging.getLogger does a dict lookup each time).

Proposed fix

Add to the top-level imports:

import json
import logging
import uuid as uuid_module

from nat.data_models.api_server import ChatResponse
from nat.data_models.api_server import ChatResponseChunk

Then at module level:

_logger = logging.getLogger(__name__)

And remove lines 226-233 from the function body.


329-336: error_occurred flag is redundant — the except block already returns.

Line 333 does return, so execution never reaches line 336. The error_occurred variable and the if not error_occurred guard can be removed; the completion events will only execute if no exception was raised.

Proposed simplification
         except Exception as workflow_error:
-            error_occurred = True
             _logger.exception("Error during Responses API streaming workflow execution")
             yield _error_event(str(workflow_error), "workflow_error")
             return  # Stop streaming after error

-        # Only emit completion events if no error occurred
-        if not error_occurred:
-            # Event: response.output_text.done
-            yield _sse_event("response.output_text.done", {
+        # Event: response.output_text.done
+        yield _sse_event("response.output_text.done", {

(and un-indent the rest of the completion events block)


263-391: Significant duplication in SSE event payloads — consider a small builder/helper.

The response object structure (id, object, status, model, output) is repeated across response.created, response.failed, and response.done events. Extracting a helper like _build_response_envelope(status, output=None) would reduce duplication and make it easier to keep the schema consistent.

Comment on lines 235 to 236
response_id = f"resp_{uuid_module.uuid4().hex}{uuid_module.uuid4().hex}"[:55]
message_id = f"msg_{uuid_module.uuid4().hex}{uuid_module.uuid4().hex}"[:56]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

ID lengths exceed the stated 50-character requirement.

The PR objective states IDs must comply with a 50-character length limit, but response_id is truncated to 55 characters and message_id to 56 characters. This will violate the length constraint the PR is explicitly trying to satisfy.

Proposed fix
-    response_id = f"resp_{uuid_module.uuid4().hex}{uuid_module.uuid4().hex}"[:55]
-    message_id = f"msg_{uuid_module.uuid4().hex}{uuid_module.uuid4().hex}"[:56]
+    response_id = f"resp_{uuid_module.uuid4().hex}"[:50]
+    message_id = f"msg_{uuid_module.uuid4().hex}"[:50]

A single uuid4().hex already provides 32 hex characters, which is more than enough entropy. With a 5-char prefix (resp_) truncated to 50 you get 45 random hex chars; with msg_ you get 46.

🤖 Prompt for AI Agents
In `@src/nat/front_ends/fastapi/response_helpers.py` around lines 235 - 236,
response_id and message_id are being truncated to 55/56 chars and thus exceed
the 50-char requirement; change them to use a single uuid_module.uuid4().hex
each and slice so that the total length including the prefix stays <=50 (e.g.
keep "resp_" prefix and take the first 45 hex chars for response_id, keep "msg_"
prefix and take the first 46 hex chars for message_id), updating the assignments
that set response_id and message_id (which currently call
uuid_module.uuid4().hex twice).

Comment on lines +393 to +396
except Exception as setup_error:
# Catch any errors during initial event emission (before workflow starts)
_logger.exception("Error during Responses API streaming setup")
yield _error_event(str(setup_error), "server_error")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Outer except may yield into an already-consumed generator.

If the initial SSE events (lines 265-298) raise, this handler catches and yields an error event. However, note that str(setup_error) could leak internal details to the client. Consider sanitizing the error message before sending it in the SSE payload.

🤖 Prompt for AI Agents
In `@src/nat/front_ends/fastapi/response_helpers.py` around lines 393 - 396, The
except block currently yields the raw exception string to the client (yield
_error_event(str(setup_error), "server_error")), potentially leaking internal
details; instead log the full exception with _logger.exception("...") and send a
sanitized message to the client (e.g., yield _error_event("Internal server
error", "server_error") or a mapped safe message), ensuring you do not expose
setup_error contents in the SSE payload while preserving detailed logs locally;
update the except block around _error_event and keep _logger.exception as-is.

@antonslutskyms antonslutskyms force-pushed the as/responses-api-frontend-fix-ext branch from 24a566b to d2e6f58 Compare February 10, 2026 14:27
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@examples/front_ends/responses_api_endpoint/README.md`:
- Around line 59-66: Update the endpoint table entry for the `/generate` row to
replace the acronym "NAT default" with the full phrase "NeMo Agent Toolkit
default" (or "NeMo Agent Toolkit") so the Format column spells out NeMo Agent
Toolkit instead of using NAT; locate the `/generate` table row in README.md and
edit its Format cell accordingly.

In `@src/nat/data_models/api_server.py`:
- Around line 895-901: Delete the unused ResponsesInputItem Pydantic model
(class ResponsesInputItem) since ResponsesRequest.input already uses the needed
type; remove the class definition and any now-unused imports (e.g., BaseModel,
Field, UserMessageContentRoleType if only referenced by that class) and run
tests/type-checks to ensure no remaining references to ResponsesInputItem exist.
🧹 Nitpick comments (2)
src/nat/front_ends/fastapi/response_helpers.py (1)

226-233: Move imports and logger to module level.

json, logging, uuid, ChatResponse, and ChatResponseChunk are imported inside the function body and the logger is created on every call. These should be at module level for clarity and to avoid repeated work.

Suggested fix

Move to the top of the file with the other imports:

import json
import uuid as uuid_module

from nat.data_models.api_server import ChatResponse
from nat.data_models.api_server import ChatResponseChunk

And add a module-level logger:

_logger = logging.getLogger(__name__)
src/nat/data_models/api_server.py (1)

985-1047: Tool conversion logic handles the key formats well.

The flat-to-nested conversion for Responses API → Chat Completions tools is correct. Logging warnings for unsupported tool types (code_interpreter, file_search, etc.) and excluding them is appropriate.

One minor note: _convert_tools instantiates a logger on every call (Lines 998-1000). Consider using a module-level logger.

Comment on lines +59 to +66
The server will start on port 8088 with the following endpoints:

| Endpoint | Format | Description |
|----------|--------|-------------|
| `/generate` | NAT default | Standard workflow endpoint |
| `/chat` | Chat Completions | OpenAI Chat Completions format |
| `/chat/stream` | Chat Completions | Streaming Chat Completions |
| `/v1/responses` | Responses API | OpenAI Responses API format |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace "NAT" acronym with "NeMo Agent Toolkit" in the endpoint table.

Line 63 uses "NAT default" as the format description. Per coding guidelines, spell out "NeMo Agent Toolkit" in Markdown documentation.

Suggested fix
-| `/generate` | NAT default | Standard workflow endpoint |
+| `/generate` | NeMo Agent Toolkit default | Standard workflow endpoint |

As per coding guidelines: "Documentation in Markdown files should not use NAT as an acronym, always spell out NeMo Agent Toolkit."

🤖 Prompt for AI Agents
In `@examples/front_ends/responses_api_endpoint/README.md` around lines 59 - 66,
Update the endpoint table entry for the `/generate` row to replace the acronym
"NAT default" with the full phrase "NeMo Agent Toolkit default" (or "NeMo Agent
Toolkit") so the Format column spells out NeMo Agent Toolkit instead of using
NAT; locate the `/generate` table row in README.md and edit its Format cell
accordingly.

Comment on lines +895 to +901
class ResponsesInputItem(BaseModel):
"""Input item for Responses API (message format)."""

type: typing.Literal["message"] = "message"
role: UserMessageContentRoleType = UserMessageContentRoleType.USER
content: str | list[dict[str, typing.Any]] = Field(description="Content of the message")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n "ResponsesInputItem" --type=py

Repository: NVIDIA/NeMo-Agent-Toolkit

Length of output: 143


🏁 Script executed:

rg -n "ResponsesRequest" --type=py -A 10 | head -50

Repository: NVIDIA/NeMo-Agent-Toolkit

Length of output: 5273


🏁 Script executed:

sed -n '903,950p' src/nat/data_models/api_server.py

Repository: NVIDIA/NeMo-Agent-Toolkit

Length of output: 2868


Remove unused ResponsesInputItem model.

ResponsesInputItem is defined but not referenced anywhere in the codebase. The ResponsesRequest.input field uses str | list[dict[str, typing.Any]] instead, making this model unnecessary (YAGNI).

🤖 Prompt for AI Agents
In `@src/nat/data_models/api_server.py` around lines 895 - 901, Delete the unused
ResponsesInputItem Pydantic model (class ResponsesInputItem) since
ResponsesRequest.input already uses the needed type; remove the class definition
and any now-unused imports (e.g., BaseModel, Field, UserMessageContentRoleType
if only referenced by that class) and run tests/type-checks to ensure no
remaining references to ResponsesInputItem exist.

@willkill07 willkill07 added feature request New feature or request non-breaking Non-breaking change Under Review PR is under review and should not be marked stale labels Feb 10, 2026
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
src/nat/data_models/api_server.py (2)

949-984: to_chat_request silently discards non-dict items and unknown roles.

When self.input is a list, non-dict items are silently skipped (the if isinstance(item, dict) guard on line 963), and invalid role strings fall back to USER without logging. For a compatibility layer, this is defensively reasonable, but a debug/warning log on these fallback paths would aid troubleshooting integration issues.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nat/data_models/api_server.py` around lines 949 - 984, The
to_chat_request method currently drops non-dict items in self.input and silently
maps unknown role strings to UserMessageContentRoleType.USER; update to emit
warning/debug logs when these fallbacks occur so callers can trace malformed
inputs: inside to_chat_request (function name) log a warning when an item in
self.input is not a dict and is skipped, and log a warning when constructing
role = UserMessageContentRoleType(role_str) falls back to USER due to ValueError
(include the offending role_str and the original item content); use the module
or class logger (or Python logging) consistently so Message creation and
downstream processing show context for these warnings.

998-1001: Avoid re-creating logger on every call to _convert_tools.

_logging.getLogger(__name__) is called inside the method body each time. Move the import and logger to module level — the module already imports logging-adjacent utilities and the logger would be reusable across methods.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nat/data_models/api_server.py` around lines 998 - 1001, The
_convert_tools function currently imports logging and calls
_logging.getLogger(__name__) on every invocation; move the import and logger
creation to module scope by adding a single module-level "import logging" (or
reuse existing logging import) and define _logger = logging.getLogger(__name__)
once at top-level, then remove the inline "import logging as _logging" and
"_logger = _logging.getLogger(__name__)" from inside _convert_tools and use the
module-level _logger instead.
src/nat/front_ends/fastapi/response_helpers.py (2)

226-233: Move imports and logger to module level.

json, logging, secrets, ChatResponse, and ChatResponseChunk are unconditionally used whenever this function runs. Placing them inside the function body adds per-call overhead and deviates from the module-level import convention used elsewhere in this file (e.g., lines 16–27).

Suggested fix

Move these to the top of the file alongside existing imports:

 import asyncio
+import json
+import logging
+import secrets
 import typing
 from collections.abc import AsyncGenerator
 
+from nat.data_models.api_server import ChatResponse
+from nat.data_models.api_server import ChatResponseChunk
 from nat.data_models.api_server import ResponseIntermediateStep

And replace the in-function logger with a module-level one:

+_logger = logging.getLogger(__name__)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nat/front_ends/fastapi/response_helpers.py` around lines 226 - 233, Move
the in-function imports and logger to module level: remove the local imports of
json, logging, secrets and the data model imports ChatResponse and
ChatResponseChunk from inside the function and add them to the top of the module
with the other imports; also create a module-level logger named _logger =
logging.getLogger(__name__) and replace the function's local logger usage with
this module-level _logger so the function uses the top-level json, secrets,
ChatResponse, ChatResponseChunk, and _logger symbols.

300-328: Non-streaming path silently swallows responses with empty/None content.

When the workflow is non-streaming (line 317–328), if result.choices is empty or result.choices[0].message is None, no delta event is emitted, and accumulated_text stays "". The completion events will then report an empty output with no indication that something may have gone wrong. Consider logging a warning when the non-streaming result yields no content, so silent failures don't go unnoticed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nat/front_ends/fastapi/response_helpers.py` around lines 300 - 328, In
the non-streaming branch inside the async with session.run(payload) block (where
you await runner.result(to_type=ChatResponse) and set accumulated_text), add a
warning log when result.choices is empty or result.choices[0].message is None so
we don't silently emit an empty output; specifically, detect the missing/None
content after obtaining result (the same place you currently set
accumulated_text) and call the module’s logger (or processLogger/response logger
used elsewhere) to warn with context including output_index and content_index
and any available metadata from result, ensuring you still emit an empty delta
only if appropriate and not swallow the condition silently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/nat/data_models/api_server.py`:
- Around line 1009-1013: The loop that builds converted from self.tools
currently appends non-dict tool objects as-is, violating the method's return
type list[dict[str, typing.Any]] | None; update the loop that constructs
converted so that when not isinstance(tool, dict) you either (a) normalize the
object to a dict (e.g., use tool.__dict__ or dataclasses.asdict for dataclass
instances, or call a to_dict()/as_dict() method if present) before appending, or
(b) skip the unsupported tool and emit a warning/log entry; ensure the branch
references the same symbols (self.tools, converted, tool) and that the final
returned list contains only dict[str, typing.Any].

In `@src/nat/front_ends/fastapi/response_helpers.py`:
- Around line 235-236: The code generates response IDs with the nonstandard
"rsp_" prefix; update the string prefixes to "resp_" where response_id and any
similar generator use f"rsp_" so they match the OpenAI Responses API convention
(e.g., change response_id = f"rsp_{secrets.token_hex(30)}"[:64] to use "resp_");
also update the matching occurrence and its docstring in the API server data
model (the docstring that references the Responses API and any variable named
response_id or resp-like generator) so the created IDs and documentation
consistently use "resp_".

---

Duplicate comments:
In `@src/nat/data_models/api_server.py`:
- Around line 1105-1110: The expressions that build the ResponsesUsage object
suffer from operator precedence: e.g.
input_tokens=chat_response.usage.prompt_tokens or 0 if chat_response.usage else
0 is parsed incorrectly; update the construction in the usage assignment so each
conditional is parenthesized, e.g. set
input_tokens=(chat_response.usage.prompt_tokens or 0) if chat_response.usage
else 0 (and similarly for output_tokens and total_tokens) to ensure the "or 0"
applies to the token value before the if/else; touches the usage variable
assignment that constructs ResponsesUsage from chat_response.usage.
- Around line 896-901: ResponsesInputItem is defined but never used; update the
Responses API types to either remove the unused model or wire it into the
request type—specifically replace the loose type on ResponsesRequest.input with
ResponsesInputItem (or make ResponsesRequest.input: str |
list[ResponsesInputItem] if strings are still allowed) so the new BaseModel is
referenced, and remove ResponsesInputItem if you choose not to use it; adjust
any imports/usages accordingly (look for ResponsesInputItem and
ResponsesRequest.input in this module).

In `@src/nat/front_ends/fastapi/response_helpers.py`:
- Around line 329-333: The code is leaking internal exception details to clients
by yielding str(workflow_error) from the except block; instead, change the
yielded client-facing error to a generic, non-sensitive message (e.g., "Internal
error during Responses API workflow") while keeping the full exception in server
logs via _logger.exception; update the yield call to pass the generic message to
_error_event and optionally include a safe error_id or correlation id, and leave
workflow_error only for logging rather than sending to clients.
- Around line 199-205: The generate_responses_api_streaming function currently
declares an unused parameter step_adaptor with a mutable default
StepAdaptor(StepAdaptorConfig()), so remove the unused parameter or accept it as
an optional parameter and construct it inside the function to avoid
mutable-defaults; specifically, change the signature for
generate_responses_api_streaming to take step_adaptor: Optional[StepAdaptor] =
None (or remove it entirely if not used), and then inside the function set
step_adaptor = step_adaptor or StepAdaptor(StepAdaptorConfig()) (or drop all
references if removed), ensuring references to StepAdaptor and StepAdaptorConfig
are updated accordingly to eliminate the Ruff ARG001 and B008 warnings.
- Around line 393-396: The outer except block leaks raw error text to clients by
yielding _error_event(str(setup_error), "server_error"); instead, keep detailed
info only in logs and send a sanitized message or an error id to the client.
Replace the yield call in the except Exception as setup_error handler so it
yields a generic/sanitized message (e.g. "internal server error" or a generated
error_id) via _error_event while retaining _logger.exception("Error during
Responses API streaming setup") to record full details; ensure references to
_error_event, setup_error and _logger.exception are updated accordingly.

---

Nitpick comments:
In `@src/nat/data_models/api_server.py`:
- Around line 949-984: The to_chat_request method currently drops non-dict items
in self.input and silently maps unknown role strings to
UserMessageContentRoleType.USER; update to emit warning/debug logs when these
fallbacks occur so callers can trace malformed inputs: inside to_chat_request
(function name) log a warning when an item in self.input is not a dict and is
skipped, and log a warning when constructing role =
UserMessageContentRoleType(role_str) falls back to USER due to ValueError
(include the offending role_str and the original item content); use the module
or class logger (or Python logging) consistently so Message creation and
downstream processing show context for these warnings.
- Around line 998-1001: The _convert_tools function currently imports logging
and calls _logging.getLogger(__name__) on every invocation; move the import and
logger creation to module scope by adding a single module-level "import logging"
(or reuse existing logging import) and define _logger =
logging.getLogger(__name__) once at top-level, then remove the inline "import
logging as _logging" and "_logger = _logging.getLogger(__name__)" from inside
_convert_tools and use the module-level _logger instead.

In `@src/nat/front_ends/fastapi/response_helpers.py`:
- Around line 226-233: Move the in-function imports and logger to module level:
remove the local imports of json, logging, secrets and the data model imports
ChatResponse and ChatResponseChunk from inside the function and add them to the
top of the module with the other imports; also create a module-level logger
named _logger = logging.getLogger(__name__) and replace the function's local
logger usage with this module-level _logger so the function uses the top-level
json, secrets, ChatResponse, ChatResponseChunk, and _logger symbols.
- Around line 300-328: In the non-streaming branch inside the async with
session.run(payload) block (where you await runner.result(to_type=ChatResponse)
and set accumulated_text), add a warning log when result.choices is empty or
result.choices[0].message is None so we don't silently emit an empty output;
specifically, detect the missing/None content after obtaining result (the same
place you currently set accumulated_text) and call the module’s logger (or
processLogger/response logger used elsewhere) to warn with context including
output_index and content_index and any available metadata from result, ensuring
you still emit an empty delta only if appropriate and not swallow the condition
silently.

Comment on lines +1009 to +1013
converted = []
for tool in self.tools:
if not isinstance(tool, dict):
converted.append(tool)
continue
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Non-dict tool appended without conversion violates return type.

Line 1012 appends tool as-is when not isinstance(tool, dict), but the method's return type is list[dict[str, typing.Any]] | None. If a non-dict tool somehow appears in the input, this silently produces a type-incorrect result. Either validate/skip with a warning (consistent with unsupported types) or convert to dict.

Suggested fix
             if not isinstance(tool, dict):
-                converted.append(tool)
+                _logger.warning("Unexpected non-dict tool entry encountered, skipping: %s", type(tool))
                 continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nat/data_models/api_server.py` around lines 1009 - 1013, The loop that
builds converted from self.tools currently appends non-dict tool objects as-is,
violating the method's return type list[dict[str, typing.Any]] | None; update
the loop that constructs converted so that when not isinstance(tool, dict) you
either (a) normalize the object to a dict (e.g., use tool.__dict__ or
dataclasses.asdict for dataclass instances, or call a to_dict()/as_dict() method
if present) before appending, or (b) skip the unsupported tool and emit a
warning/log entry; ensure the branch references the same symbols (self.tools,
converted, tool) and that the final returned list contains only dict[str,
typing.Any].

Comment on lines +235 to +236
response_id = f"rsp_{secrets.token_hex(30)}"[:64]
message_id = f"msg_{secrets.token_hex(30)}"[:64]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find and inspect the response_helpers.py file at lines 235-236
fd response_helpers.py

Repository: NVIDIA/NeMo-Agent-Toolkit

Length of output: 115


🏁 Script executed:

# Find and inspect the api_server.py file
fd api_server.py

Repository: NVIDIA/NeMo-Agent-Toolkit

Length of output: 146


🌐 Web query:

OpenAI Responses API response ID prefix format convention

💡 Result:

In the OpenAI Responses API, the top-level Response object’s id uses the prefix:

  • resp_... (e.g., resp_67ccd2bed1ec8190...) [1][2]

The part after resp_ is an opaque identifier (you shouldn’t assume any structure beyond it being a unique ID string). [1][2]


🏁 Script executed:

# Read response_helpers.py around lines 235-236
sed -n '230,240p' src/nat/front_ends/fastapi/response_helpers.py

Repository: NVIDIA/NeMo-Agent-Toolkit

Length of output: 432


🏁 Script executed:

# Read api_server.py around line 1083
sed -n '1078,1088p' src/nat/data_models/api_server.py

Repository: NVIDIA/NeMo-Agent-Toolkit

Length of output: 578


🏁 Script executed:

# Search for all occurrences of rsp_ and resp_ prefixes in the codebase
rg "rsp_|resp_" --type py -n

Repository: NVIDIA/NeMo-Agent-Toolkit

Length of output: 929


Change rsp_ prefix to resp_ to match OpenAI Responses API convention.

OpenAI's Responses API uses the resp_ prefix for response IDs (e.g., resp_67ccd2bed1ec8190...). Update both locations:

  • src/nat/front_ends/fastapi/response_helpers.py line 235: f"rsp_"f"resp_"
  • src/nat/data_models/api_server.py line 1083: f"rsp_"f"resp_" (also update the docstring which correctly references the OpenAI Responses API)

This mismatch could cause compatibility issues with OpenAI clients or platforms expecting the standard prefix format.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nat/front_ends/fastapi/response_helpers.py` around lines 235 - 236, The
code generates response IDs with the nonstandard "rsp_" prefix; update the
string prefixes to "resp_" where response_id and any similar generator use
f"rsp_" so they match the OpenAI Responses API convention (e.g., change
response_id = f"rsp_{secrets.token_hex(30)}"[:64] to use "resp_"); also update
the matching occurrence and its docstring in the API server data model (the
docstring that references the Responses API and any variable named response_id
or resp-like generator) so the created IDs and documentation consistently use
"resp_".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DO NOT MERGE PR should not be merged; see PR for details feature request New feature or request non-breaking Non-breaking change Under Review PR is under review and should not be marked stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments