Responses API fixes to comply with Microsoft Foundry Hosted Agents #1573
Responses API fixes to comply with Microsoft Foundry Hosted Agents #1573antonslutskyms wants to merge 13 commits intoNVIDIA:developfrom
Conversation
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
WalkthroughAdds OpenAI Responses API support to the FastAPI front end: new Responses API data models and converters; endpoint format detection/config (auto, chat_completions, responses); /v1/responses route with SSE streaming; Responses↔Chat conversion (including tool-format translation); InputTextContent handling; example config/docs; and extensive tests. Changes
Sequence DiagramsequenceDiagram
participant Client as Client
participant FastAPI as FastAPI Front-End
participant Detector as Format Detector
participant Converter as Responses→Chat Converter
participant Workflow as NeMo Workflow
participant Builder as Responses Builder
Client->>FastAPI: POST /v1/responses (ResponsesRequest)
FastAPI->>Detector: _determine_api_format(endpoint, path)
Detector-->>FastAPI: "responses"
FastAPI->>Converter: to_chat_request(ResponsesRequest)
Converter-->>FastAPI: ChatRequest
FastAPI->>Workflow: execute(ChatRequest)
Workflow-->>FastAPI: ChatResponse / stream chunks
FastAPI->>Builder: ResponsesAPIResponse.from_chat_response / stream events
Builder-->>FastAPI: SSE events or JSON payload
FastAPI-->>Client: SSE stream or JSON response
rect rgba(100,150,200,0.5)
Note over FastAPI,Builder: Streaming emits: response.created → output_item.added → content_part.added → output_text.delta → ... → response.done
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
🤖 Fix all issues with AI agents
In `@examples/front_ends/responses_api_endpoint/README.md`:
- Around line 197-201: Update the README Limitations section to replace the
acronym "NAT" with the full name "NeMo Agent Toolkit" where it appears
(specifically the bullet "- **No Built-in Tools**: OpenAI built-in tools like
`code_interpreter` are not executed by NAT; use the `responses_api_agent`
workflow type for that functionality"), ensuring the sentence now reads that
those tools are not executed by the NeMo Agent Toolkit and preserving the rest
of the text and formatting.
- Around line 118-133: Update the fenced code block that shows the SSE events by
adding the language specifier "text" after the opening backticks (i.e., change
``` to ```text) so the block is explicitly marked as text; locate the SSE
example block containing lines like "event: response.created" and "data:
{\"type\": \"response.created\", ...}" and apply the change to the opening
fence.
- Line 22: Replace any occurrences of the lowercase/abbreviated name with the
correct official name: change "NeMo Agent toolkit" and any "NAT" usages in the
README to "NeMo Agent Toolkit" (also update the occurrence referenced at the
second mention currently on the file). Ensure capitalization exactly matches
"NeMo Agent Toolkit" everywhere in the document.
In `@src/nat/data_models/api_server.py`:
- Around line 1104-1109: The usage conversion has an operator-precedence bug:
guard accesses to chat_response.usage with the conditional by parenthesizing
each expression; change the ResponsesUsage construction so each field uses the
form (chat_response.usage.prompt_tokens or 0) if chat_response.usage else 0,
(chat_response.usage.completion_tokens or 0) if chat_response.usage else 0, and
(chat_response.usage.total_tokens or 0) if chat_response.usage else 0 to ensure
attribute access is only attempted when chat_response.usage is truthy.
In `@src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py`:
- Around line 950-957: The code in post_responses_api_compatible uses
getattr(payload, 'model', 'unknown-model') but payload.model exists and can be
None, so the default never applies; change how model_name is derived to ensure a
fallback string when payload.model is None (e.g., use payload.model or
'unknown-model' or getattr(payload, 'model') or 'unknown-model') so that the
value passed into generate_responses_api_streaming and
ResponsesAPIResponse.from_chat_response is never None.
In `@src/nat/front_ends/fastapi/response_helpers.py`:
- Around line 199-205: The function generate_responses_api_streaming currently
declares an unused parameter step_adaptor with a mutable default
StepAdaptor(StepAdaptorConfig()), which triggers ARG001 and B008; remove the
step_adaptor parameter from the signature and any related imports/usages, or if
you intend to keep it for future use change it to an optional parameter (e.g.,
step_adaptor: StepAdaptor | None = None) without constructing a default instance
in the signature and add a short comment documenting planned usage; update any
callers accordingly and remove StepAdaptorConfig() construction from the
function definition.
In `@tests/nat/front_ends/fastapi/test_openai_compatibility.py`:
- Around line 582-605: Add three tests to cover the PR changes: (1)
test_responses_request_optional_model — instantiate ResponsesRequest without a
model (ResponsesRequest(input="...")), assert request.model is None,
request.input preserved, and that request.to_chat_request() produces a
ChatRequest whose first message content equals the original input; (2)
test_responses_request_input_text_content — create a ResponsesRequest with input
as a message whose content is [{"type":"input_text","text":"hello"}], call
to_chat_request() and assert the resulting ChatRequest has one message with role
"user" and the text preserved/converted into the message content; (3)
test_response_id_length_compliance — create a ChatResponse via
ChatResponse.from_string(..., usage=Usage(...)) and convert to
ResponsesAPIResponse with ResponsesAPIResponse.from_chat_response(...,
model="test"), then assert len(chat_response.id) <= 50 and
len(responses_api_response.id) <= 50. Use the existing imports from
nat.data_models.api_server and reference ResponsesRequest.to_chat_request,
ResponsesAPIResponse.from_chat_response, ChatResponse.from_string, and Usage.
- Around line 855-857: Update the incorrect comment on the assertion in
test_openai_compatibility (the assertion that checks response.status_code ==
422) to reflect that only the 'input' field is missing now (since 'model' is
optional in this PR); locate the assertion using the symbol response.status_code
and replace the comment "# Validation error - missing 'input' and 'model'" with
a concise comment like "# Validation error - missing 'input'".
- Around line 669-679: The test test_responses_api_response_from_chat_response
constructs a Usage with only prompt_tokens and completion_tokens; update the
Usage initialization (used with ChatResponse.from_string) to include
total_tokens=15 for consistency with other tests (e.g., the Usage on line 156)
so the call becomes Usage(prompt_tokens=10, completion_tokens=5,
total_tokens=15) before creating ResponsesAPIResponse.from_chat_response; no
other changes needed to ChatResponse or ResponsesAPIResponse.
🧹 Nitpick comments (5)
src/nat/front_ends/fastapi/response_helpers.py (1)
226-233: Move standard-library and local imports to module level.
json,logging, anduuidare imported inside the function body.ChatResponseandChatResponseChunkare already used elsewhere in this module (referenced in relevant snippets). Hoisting them to the top removes per-call import overhead and aligns with PEP 8 conventions.Also,
_loggeron Line 233 shadows the pattern used elsewhere in the codebase (module-levellogger). Consider reusing a module-level logger instead of creating one per call.src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (1)
321-358: Moveimport reto module level.
reis imported inside_determine_api_formaton each call. As a standard-library module with no side effects, it should be at the top of the file per PEP 8. Also, the regex pattern at Line 351 is recompiled every call — consider compiling it once at module level.Suggested fix (module-level constant)
# At module level: import re _RESPONSES_PATH_PATTERN = re.compile(r'^(/api)?(/v\d+)?/responses$')- import re - # ... - responses_pattern = re.compile(r'^(/api)?(/v\d+)?/responses$') - - if responses_pattern.match(path_lower): + if _RESPONSES_PATH_PATTERN.match(path_lower):src/nat/data_models/api_server.py (1)
998-1000: Moveloggingimport to module level.
loggingis imported inside_convert_toolsand a new logger is created on each call. The module already usesimport logging-style patterns elsewhere (e.g., theloggingimport isn't at the top of this file's visible portion, but_loggershould be a module-level singleton).tests/nat/front_ends/fastapi/test_openai_compatibility.py (2)
582-584: Consolidate inline imports to module level for consistency.
ResponsesRequest,ResponsesAPIResponse,ResponsesOutputContent,ResponsesOutputItem, andFastApiFrontEndPluginWorkerare repeatedly imported inside individual test functions. Meanwhile, equivalent models (ChatRequest,ChatResponse, etc.) andConfig/GeneralConfig/FastApiFrontEndConfigare already at module level. Moving these inline imports to the top would reduce duplication and align with PEP 8.Additionally,
ConfigandGeneralConfigare re-imported inline at lines 1049 and 1071 despite already being module-level imports.
682-693: Extract repeated config setup into pytest fixtures.At least 7 async test functions repeat a near-identical pattern: create
FastApiFrontEndConfig, setopenai_api_v1_path, buildConfigwithEchoFunctionConfig/StreamingEchoFunctionConfig, thenbuild_nat_client. Extracting this into fixtures would reduce duplication significantly.Example fixture
`@pytest.fixture`(name="responses_api_echo_client") async def fixture_responses_api_echo_client(): front_end_config = FastApiFrontEndConfig() front_end_config.workflow.openai_api_v1_path = "/v1/responses" front_end_config.workflow.openai_api_path = "/chat" config = Config( general=GeneralConfig(front_end=front_end_config), workflow=EchoFunctionConfig(use_openai_api=True), ) async with build_nat_client(config) as client: yield clientAs per coding guidelines,
Any frequently repeated code should be extracted into pytest fixtures.
|
|
||
| **Complexity:** 🟢 Beginner | ||
|
|
||
| This example demonstrates how to configure a NeMo Agent toolkit FastAPI frontend to accept requests in the [OpenAI Responses API format](https://platform.openai.com/docs/api-reference/responses). |
There was a problem hiding this comment.
Spell out "NeMo Agent Toolkit" with correct capitalization.
Per coding guidelines, documentation in Markdown files should not use NAT as an acronym and should always spell out "NeMo Agent Toolkit." Also, "toolkit" should be capitalized to "Toolkit" to match the official name. This also applies to Line 37.
Suggested fix
-This example demonstrates how to configure a NeMo Agent toolkit FastAPI frontend to accept requests in the [OpenAI Responses API format](https://platform.openai.com/docs/api-reference/responses).
+This example demonstrates how to configure a NeMo Agent Toolkit FastAPI frontend to accept requests in the [OpenAI Responses API format](https://platform.openai.com/docs/api-reference/responses).As per coding guidelines: "Documentation in Markdown files should not use NAT as an acronym, always spell out NeMo Agent Toolkit."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| This example demonstrates how to configure a NeMo Agent toolkit FastAPI frontend to accept requests in the [OpenAI Responses API format](https://platform.openai.com/docs/api-reference/responses). | |
| This example demonstrates how to configure a NeMo Agent Toolkit FastAPI frontend to accept requests in the [OpenAI Responses API format](https://platform.openai.com/docs/api-reference/responses). |
🤖 Prompt for AI Agents
In `@examples/front_ends/responses_api_endpoint/README.md` at line 22, Replace any
occurrences of the lowercase/abbreviated name with the correct official name:
change "NeMo Agent toolkit" and any "NAT" usages in the README to "NeMo Agent
Toolkit" (also update the occurrence referenced at the second mention currently
on the file). Ensure capitalization exactly matches "NeMo Agent Toolkit"
everywhere in the document.
| ``` | ||
| event: response.created | ||
| data: {"type": "response.created", "response": {"id": "resp_...", "status": "in_progress"}} | ||
|
|
||
| event: response.output_item.added | ||
| data: {"type": "response.output_item.added", ...} | ||
|
|
||
| event: response.output_text.delta | ||
| data: {"type": "response.output_text.delta", "delta": "The current"} | ||
|
|
||
| event: response.output_text.delta | ||
| data: {"type": "response.output_text.delta", "delta": " time is..."} | ||
|
|
||
| event: response.done | ||
| data: {"type": "response.done", "response": {"status": "completed", ...}} | ||
| ``` |
There was a problem hiding this comment.
Add a language specifier to the fenced code block.
The SSE events example block is missing a language identifier. Since this is a text-based event stream format, use text as the language.
Suggested fix
-```
+```text
event: response.created📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ``` | |
| event: response.created | |
| data: {"type": "response.created", "response": {"id": "resp_...", "status": "in_progress"}} | |
| event: response.output_item.added | |
| data: {"type": "response.output_item.added", ...} | |
| event: response.output_text.delta | |
| data: {"type": "response.output_text.delta", "delta": "The current"} | |
| event: response.output_text.delta | |
| data: {"type": "response.output_text.delta", "delta": " time is..."} | |
| event: response.done | |
| data: {"type": "response.done", "response": {"status": "completed", ...}} | |
| ``` |
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)
[warning] 118-118: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In `@examples/front_ends/responses_api_endpoint/README.md` around lines 118 - 133,
Update the fenced code block that shows the SSE events by adding the language
specifier "text" after the opening backticks (i.e., change ``` to ```text) so
the block is explicitly marked as text; locate the SSE example block containing
lines like "event: response.created" and "data: {\"type\": \"response.created\",
...}" and apply the change to the opening fence.
| ## Limitations | ||
|
|
||
| - **No Stateful Backend**: `previous_response_id` is accepted but ignored | ||
| - **No Built-in Tools**: OpenAI built-in tools like `code_interpreter` are not executed by NAT; use the `responses_api_agent` workflow type for that functionality | ||
| - **Tool Format Conversion**: Responses API tool definitions are converted to Chat Completions format internally |
There was a problem hiding this comment.
Replace "NAT" acronym with "NeMo Agent Toolkit" in documentation.
Lines 200 uses "NAT" as an acronym. The coding guidelines require spelling out "NeMo Agent Toolkit" in Markdown documentation.
Suggested fix
-- **No Built-in Tools**: OpenAI built-in tools like `code_interpreter` are not executed by NAT; use the `responses_api_agent` workflow type for that functionality
+- **No Built-in Tools**: OpenAI built-in tools like `code_interpreter` are not executed by NeMo Agent Toolkit; use the `responses_api_agent` workflow type for that functionalityAs per coding guidelines: "Documentation in Markdown files should not use NAT as an acronym, always spell out NeMo Agent Toolkit."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ## Limitations | |
| - **No Stateful Backend**: `previous_response_id` is accepted but ignored | |
| - **No Built-in Tools**: OpenAI built-in tools like `code_interpreter` are not executed by NAT; use the `responses_api_agent` workflow type for that functionality | |
| - **Tool Format Conversion**: Responses API tool definitions are converted to Chat Completions format internally | |
| ## Limitations | |
| - **No Stateful Backend**: `previous_response_id` is accepted but ignored | |
| - **No Built-in Tools**: OpenAI built-in tools like `code_interpreter` are not executed by NeMo Agent Toolkit; use the `responses_api_agent` workflow type for that functionality | |
| - **Tool Format Conversion**: Responses API tool definitions are converted to Chat Completions format internally |
🤖 Prompt for AI Agents
In `@examples/front_ends/responses_api_endpoint/README.md` around lines 197 - 201,
Update the README Limitations section to replace the acronym "NAT" with the full
name "NeMo Agent Toolkit" where it appears (specifically the bullet "- **No
Built-in Tools**: OpenAI built-in tools like `code_interpreter` are not executed
by NAT; use the `responses_api_agent` workflow type for that functionality"),
ensuring the sentence now reads that those tools are not executed by the NeMo
Agent Toolkit and preserving the rest of the text and formatting.
| # Convert usage if available | ||
| usage = ResponsesUsage( | ||
| input_tokens=chat_response.usage.prompt_tokens or 0 if chat_response.usage else 0, | ||
| output_tokens=chat_response.usage.completion_tokens or 0 if chat_response.usage else 0, | ||
| total_tokens=chat_response.usage.total_tokens or 0 if chat_response.usage else 0, | ||
| ) |
There was a problem hiding this comment.
Operator precedence bug in usage conversion — add parentheses.
Due to Python's precedence rules, A or B if C else D is parsed as A or (B if C else D), not (A or B) if C else D. This means chat_response.usage.prompt_tokens is evaluated unconditionally — if chat_response.usage were ever None, this would raise AttributeError before the guard kicks in.
ChatResponse.usage is currently required (non-optional), so this won't crash today, but the intent is clearly to guard the attribute access with the conditional. Adding parentheses makes the logic correct and future-proof.
Suggested fix
usage = ResponsesUsage(
- input_tokens=chat_response.usage.prompt_tokens or 0 if chat_response.usage else 0,
- output_tokens=chat_response.usage.completion_tokens or 0 if chat_response.usage else 0,
- total_tokens=chat_response.usage.total_tokens or 0 if chat_response.usage else 0,
+ input_tokens=(chat_response.usage.prompt_tokens or 0) if chat_response.usage else 0,
+ output_tokens=(chat_response.usage.completion_tokens or 0) if chat_response.usage else 0,
+ total_tokens=(chat_response.usage.total_tokens or 0) if chat_response.usage else 0,
)🤖 Prompt for AI Agents
In `@src/nat/data_models/api_server.py` around lines 1104 - 1109, The usage
conversion has an operator-precedence bug: guard accesses to chat_response.usage
with the conditional by parenthesizing each expression; change the
ResponsesUsage construction so each field uses the form
(chat_response.usage.prompt_tokens or 0) if chat_response.usage else 0,
(chat_response.usage.completion_tokens or 0) if chat_response.usage else 0, and
(chat_response.usage.total_tokens or 0) if chat_response.usage else 0 to ensure
attribute access is only attempted when chat_response.usage is truthy.
| from nat.data_models.api_server import ResponsesAPIResponse | ||
|
|
||
| async def post_responses_api_compatible(response: Response, request: Request, payload: request_type): | ||
| stream_requested = getattr(payload, 'stream', False) | ||
|
|
||
| # Convert Responses API request to internal ChatRequest format | ||
| chat_request = payload.to_chat_request() | ||
| model_name = getattr(payload, 'model', 'unknown-model') |
There was a problem hiding this comment.
Bug: getattr(payload, 'model', 'unknown-model') returns None when model is None.
Since model is defined on ResponsesRequest as str | None = None, the attribute always exists, so getattr's default ('unknown-model') is never used. When the user omits model, getattr returns None, which then propagates as a None model name to generate_responses_api_streaming and ResponsesAPIResponse.from_chat_response.
Use or to provide the fallback:
Suggested fix
- model_name = getattr(payload, 'model', 'unknown-model')
+ model_name = payload.model or "unknown-model"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from nat.data_models.api_server import ResponsesAPIResponse | |
| async def post_responses_api_compatible(response: Response, request: Request, payload: request_type): | |
| stream_requested = getattr(payload, 'stream', False) | |
| # Convert Responses API request to internal ChatRequest format | |
| chat_request = payload.to_chat_request() | |
| model_name = getattr(payload, 'model', 'unknown-model') | |
| from nat.data_models.api_server import ResponsesAPIResponse | |
| async def post_responses_api_compatible(response: Response, request: Request, payload: request_type): | |
| stream_requested = getattr(payload, 'stream', False) | |
| # Convert Responses API request to internal ChatRequest format | |
| chat_request = payload.to_chat_request() | |
| model_name = payload.model or "unknown-model" |
🤖 Prompt for AI Agents
In `@src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py` around lines
950 - 957, The code in post_responses_api_compatible uses getattr(payload,
'model', 'unknown-model') but payload.model exists and can be None, so the
default never applies; change how model_name is derived to ensure a fallback
string when payload.model is None (e.g., use payload.model or 'unknown-model' or
getattr(payload, 'model') or 'unknown-model') so that the value passed into
generate_responses_api_streaming and ResponsesAPIResponse.from_chat_response is
never None.
| async def generate_responses_api_streaming( | ||
| payload: typing.Any, | ||
| *, | ||
| session: Session, | ||
| model: str, | ||
| step_adaptor: StepAdaptor = StepAdaptor(StepAdaptorConfig()), | ||
| ) -> AsyncGenerator[str, None]: |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Remove unused step_adaptor parameter; avoid mutable defaults in signature.
Ruff correctly flags that step_adaptor is never used in the function body (ARG001). Additionally, constructing StepAdaptor(StepAdaptorConfig()) in the default argument is flagged by B008 — mutable objects in defaults are evaluated once at definition time and shared across calls.
Remove the parameter entirely since it's unused. If it's intended for future use, accept None and document the plan.
Suggested fix
async def generate_responses_api_streaming(
payload: typing.Any,
*,
session: Session,
model: str,
- step_adaptor: StepAdaptor = StepAdaptor(StepAdaptorConfig()),
) -> AsyncGenerator[str, None]:🧰 Tools
🪛 Ruff (0.14.14)
[warning] 204-204: Unused function argument: step_adaptor
(ARG001)
[warning] 204-204: Do not perform function call StepAdaptor in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
[warning] 204-204: Do not perform function call StepAdaptorConfig in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
🤖 Prompt for AI Agents
In `@src/nat/front_ends/fastapi/response_helpers.py` around lines 199 - 205, The
function generate_responses_api_streaming currently declares an unused parameter
step_adaptor with a mutable default StepAdaptor(StepAdaptorConfig()), which
triggers ARG001 and B008; remove the step_adaptor parameter from the signature
and any related imports/usages, or if you intend to keep it for future use
change it to an optional parameter (e.g., step_adaptor: StepAdaptor | None =
None) without constructing a default instance in the signature and add a short
comment documenting planned usage; update any callers accordingly and remove
StepAdaptorConfig() construction from the function definition.
| def test_responses_request_model(): | ||
| """Test that ResponsesRequest model correctly parses Responses API format.""" | ||
| from nat.data_models.api_server import ResponsesRequest | ||
|
|
||
| # Test with simple string input | ||
| request = ResponsesRequest(model="gpt-4o-mini", input="What is the weather?") | ||
| assert request.model == "gpt-4o-mini" | ||
| assert request.input == "What is the weather?" | ||
| assert request.stream is False | ||
|
|
||
| # Test with message-style input | ||
| request = ResponsesRequest( | ||
| model="gpt-4o-mini", | ||
| input=[{ | ||
| "role": "user", "content": "Hello" | ||
| }], | ||
| instructions="You are a helpful assistant.", | ||
| tools=[{ | ||
| "type": "function", "name": "get_weather", "description": "Get weather" | ||
| }], | ||
| ) | ||
| assert isinstance(request.input, list) | ||
| assert request.instructions == "You are a helpful assistant." | ||
| assert request.tools is not None |
There was a problem hiding this comment.
Missing test coverage for key PR objectives: optional model and input_text content type.
The PR explicitly introduces: (1) making model optional, (2) supporting {"type": "input_text", "text": "..."} content items, and (3) 50-char ID length compliance. None of these are tested here or elsewhere in the file.
Suggested additional test cases
def test_responses_request_optional_model():
"""Test that ResponsesRequest accepts requests without a model field."""
from nat.data_models.api_server import ResponsesRequest
# model should be optional
request = ResponsesRequest(input="Hello without model")
assert request.model is None
assert request.input == "Hello without model"
# Conversion to ChatRequest should still work
chat_request = request.to_chat_request()
assert chat_request.messages[0].content == "Hello without model"
def test_responses_request_input_text_content():
"""Test that ResponsesRequest handles input_text content items (Azure Foundry compatibility)."""
from nat.data_models.api_server import ResponsesRequest
request = ResponsesRequest(
model="gpt-4",
input=[{
"role": "user",
"content": [{"type": "input_text", "text": "hello"}],
}],
)
chat_request = request.to_chat_request()
# Verify input_text content is correctly converted
assert len(chat_request.messages) == 1
assert chat_request.messages[0].role.value == "user"
def test_response_id_length_compliance():
"""Test that response and message IDs comply with 50-character length requirement."""
from nat.data_models.api_server import ResponsesAPIResponse, ChatResponse, Usage
chat_response = ChatResponse.from_string("Test", usage=Usage(prompt_tokens=1, completion_tokens=1))
responses_api_response = ResponsesAPIResponse.from_chat_response(chat_response, model="test")
assert len(responses_api_response.id) <= 50
assert len(chat_response.id) <= 50As per coding guidelines, Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code.
🤖 Prompt for AI Agents
In `@tests/nat/front_ends/fastapi/test_openai_compatibility.py` around lines 582 -
605, Add three tests to cover the PR changes: (1)
test_responses_request_optional_model — instantiate ResponsesRequest without a
model (ResponsesRequest(input="...")), assert request.model is None,
request.input preserved, and that request.to_chat_request() produces a
ChatRequest whose first message content equals the original input; (2)
test_responses_request_input_text_content — create a ResponsesRequest with input
as a message whose content is [{"type":"input_text","text":"hello"}], call
to_chat_request() and assert the resulting ChatRequest has one message with role
"user" and the text preserved/converted into the message content; (3)
test_response_id_length_compliance — create a ChatResponse via
ChatResponse.from_string(..., usage=Usage(...)) and convert to
ResponsesAPIResponse with ResponsesAPIResponse.from_chat_response(...,
model="test"), then assert len(chat_response.id) <= 50 and
len(responses_api_response.id) <= 50. Use the existing imports from
nat.data_models.api_server and reference ResponsesRequest.to_chat_request,
ResponsesAPIResponse.from_chat_response, ChatResponse.from_string, and Usage.
| def test_responses_api_response_from_chat_response(): | ||
| """Test conversion from ChatResponse to ResponsesAPIResponse.""" | ||
| from nat.data_models.api_server import ResponsesAPIResponse | ||
|
|
||
| chat_response = ChatResponse.from_string("Test response", usage=Usage(prompt_tokens=10, completion_tokens=5)) | ||
| responses_api_response = ResponsesAPIResponse.from_chat_response(chat_response, model="test-model") | ||
|
|
||
| assert responses_api_response.object == "response" | ||
| assert responses_api_response.model == "test-model" | ||
| assert len(responses_api_response.output) == 1 | ||
| assert responses_api_response.output[0].content[0].text == "Test response" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find Usage model definition
rg -n "class Usage" --type=py -A 20 -g '!**/test/**'Repository: NVIDIA/NeMo-Agent-Toolkit
Length of output: 4831
🏁 Script executed:
#!/bin/bash
# Check for validators or computed fields related to total_tokens
rg -n "total_tokens" --type=py -g '!**/test/**' -B 2 -A 2Repository: NVIDIA/NeMo-Agent-Toolkit
Length of output: 50381
Add total_tokens to Usage initialization for consistency with line 156.
Line 673 creates Usage(prompt_tokens=10, completion_tokens=5) without total_tokens, while line 156 in the same file provides it: Usage(prompt_tokens=1, completion_tokens=1, total_tokens=2). Although total_tokens is optional (defaults to None), the established pattern in this test file and throughout the codebase is to always provide all three values explicitly. Provide total_tokens=15 to match line 156's approach and maintain consistency.
🤖 Prompt for AI Agents
In `@tests/nat/front_ends/fastapi/test_openai_compatibility.py` around lines 669 -
679, The test test_responses_api_response_from_chat_response constructs a Usage
with only prompt_tokens and completion_tokens; update the Usage initialization
(used with ChatResponse.from_string) to include total_tokens=15 for consistency
with other tests (e.g., the Usage on line 156) so the call becomes
Usage(prompt_tokens=10, completion_tokens=5, total_tokens=15) before creating
ResponsesAPIResponse.from_chat_response; no other changes needed to ChatResponse
or ResponsesAPIResponse.
| }, | ||
| ) | ||
| assert response.status_code == 422 # Validation error - missing 'input' and 'model' |
There was a problem hiding this comment.
Incorrect comment: model is now optional per this PR.
The comment states # Validation error - missing 'input' and 'model', but this PR makes model optional. The 422 here is due to the missing input field only.
Suggested fix
- assert response.status_code == 422 # Validation error - missing 'input' and 'model'
+ assert response.status_code == 422 # Validation error - missing 'input'🤖 Prompt for AI Agents
In `@tests/nat/front_ends/fastapi/test_openai_compatibility.py` around lines 855 -
857, Update the incorrect comment on the assertion in test_openai_compatibility
(the assertion that checks response.status_code == 422) to reflect that only the
'input' field is missing now (since 'model' is optional in this PR); locate the
assertion using the symbol response.status_code and replace the comment "#
Validation error - missing 'input' and 'model'" with a concise comment like "#
Validation error - missing 'input'".
|
This PR does not yet address the issue of stateful agents. This API proxy would lead to some "false advertising". We do not plan to merge this until the above has been addressed |
310824e to
24a566b
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/nat/front_ends/fastapi/response_helpers.py`:
- Around line 393-396: The except block currently yields the raw exception
string to the client (yield _error_event(str(setup_error), "server_error")),
potentially leaking internal details; instead log the full exception with
_logger.exception("...") and send a sanitized message to the client (e.g., yield
_error_event("Internal server error", "server_error") or a mapped safe message),
ensuring you do not expose setup_error contents in the SSE payload while
preserving detailed logs locally; update the except block around _error_event
and keep _logger.exception as-is.
- Around line 235-236: response_id and message_id are being truncated to 55/56
chars and thus exceed the 50-char requirement; change them to use a single
uuid_module.uuid4().hex each and slice so that the total length including the
prefix stays <=50 (e.g. keep "resp_" prefix and take the first 45 hex chars for
response_id, keep "msg_" prefix and take the first 46 hex chars for message_id),
updating the assignments that set response_id and message_id (which currently
call uuid_module.uuid4().hex twice).
🧹 Nitpick comments (3)
src/nat/front_ends/fastapi/response_helpers.py (3)
226-233: Move imports and logger to module level.
json,logging,uuid, and thenat.data_models.api_serverimports are standard/stable and should live at the top of the file. Recreating the logger on every call is a minor inefficiency (logging.getLoggerdoes a dict lookup each time).Proposed fix
Add to the top-level imports:
import json import logging import uuid as uuid_module from nat.data_models.api_server import ChatResponse from nat.data_models.api_server import ChatResponseChunkThen at module level:
_logger = logging.getLogger(__name__)And remove lines 226-233 from the function body.
329-336:error_occurredflag is redundant — theexceptblock already returns.Line 333 does
return, so execution never reaches line 336. Theerror_occurredvariable and theif not error_occurredguard can be removed; the completion events will only execute if no exception was raised.Proposed simplification
except Exception as workflow_error: - error_occurred = True _logger.exception("Error during Responses API streaming workflow execution") yield _error_event(str(workflow_error), "workflow_error") return # Stop streaming after error - # Only emit completion events if no error occurred - if not error_occurred: - # Event: response.output_text.done - yield _sse_event("response.output_text.done", { + # Event: response.output_text.done + yield _sse_event("response.output_text.done", {(and un-indent the rest of the completion events block)
263-391: Significant duplication in SSE event payloads — consider a small builder/helper.The response object structure (
id,object,status,model,output) is repeated acrossresponse.created,response.failed, andresponse.doneevents. Extracting a helper like_build_response_envelope(status, output=None)would reduce duplication and make it easier to keep the schema consistent.
| response_id = f"resp_{uuid_module.uuid4().hex}{uuid_module.uuid4().hex}"[:55] | ||
| message_id = f"msg_{uuid_module.uuid4().hex}{uuid_module.uuid4().hex}"[:56] |
There was a problem hiding this comment.
ID lengths exceed the stated 50-character requirement.
The PR objective states IDs must comply with a 50-character length limit, but response_id is truncated to 55 characters and message_id to 56 characters. This will violate the length constraint the PR is explicitly trying to satisfy.
Proposed fix
- response_id = f"resp_{uuid_module.uuid4().hex}{uuid_module.uuid4().hex}"[:55]
- message_id = f"msg_{uuid_module.uuid4().hex}{uuid_module.uuid4().hex}"[:56]
+ response_id = f"resp_{uuid_module.uuid4().hex}"[:50]
+ message_id = f"msg_{uuid_module.uuid4().hex}"[:50]A single uuid4().hex already provides 32 hex characters, which is more than enough entropy. With a 5-char prefix (resp_) truncated to 50 you get 45 random hex chars; with msg_ you get 46.
🤖 Prompt for AI Agents
In `@src/nat/front_ends/fastapi/response_helpers.py` around lines 235 - 236,
response_id and message_id are being truncated to 55/56 chars and thus exceed
the 50-char requirement; change them to use a single uuid_module.uuid4().hex
each and slice so that the total length including the prefix stays <=50 (e.g.
keep "resp_" prefix and take the first 45 hex chars for response_id, keep "msg_"
prefix and take the first 46 hex chars for message_id), updating the assignments
that set response_id and message_id (which currently call
uuid_module.uuid4().hex twice).
| except Exception as setup_error: | ||
| # Catch any errors during initial event emission (before workflow starts) | ||
| _logger.exception("Error during Responses API streaming setup") | ||
| yield _error_event(str(setup_error), "server_error") |
There was a problem hiding this comment.
Outer except may yield into an already-consumed generator.
If the initial SSE events (lines 265-298) raise, this handler catches and yields an error event. However, note that str(setup_error) could leak internal details to the client. Consider sanitizing the error message before sending it in the SSE payload.
🤖 Prompt for AI Agents
In `@src/nat/front_ends/fastapi/response_helpers.py` around lines 393 - 396, The
except block currently yields the raw exception string to the client (yield
_error_event(str(setup_error), "server_error")), potentially leaking internal
details; instead log the full exception with _logger.exception("...") and send a
sanitized message to the client (e.g., yield _error_event("Internal server
error", "server_error") or a mapped safe message), ensuring you do not expose
setup_error contents in the SSE payload while preserving detailed logs locally;
update the except block around _error_event and keep _logger.exception as-is.
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
24a566b to
d2e6f58
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@examples/front_ends/responses_api_endpoint/README.md`:
- Around line 59-66: Update the endpoint table entry for the `/generate` row to
replace the acronym "NAT default" with the full phrase "NeMo Agent Toolkit
default" (or "NeMo Agent Toolkit") so the Format column spells out NeMo Agent
Toolkit instead of using NAT; locate the `/generate` table row in README.md and
edit its Format cell accordingly.
In `@src/nat/data_models/api_server.py`:
- Around line 895-901: Delete the unused ResponsesInputItem Pydantic model
(class ResponsesInputItem) since ResponsesRequest.input already uses the needed
type; remove the class definition and any now-unused imports (e.g., BaseModel,
Field, UserMessageContentRoleType if only referenced by that class) and run
tests/type-checks to ensure no remaining references to ResponsesInputItem exist.
🧹 Nitpick comments (2)
src/nat/front_ends/fastapi/response_helpers.py (1)
226-233: Move imports and logger to module level.
json,logging,uuid,ChatResponse, andChatResponseChunkare imported inside the function body and the logger is created on every call. These should be at module level for clarity and to avoid repeated work.Suggested fix
Move to the top of the file with the other imports:
import json import uuid as uuid_module from nat.data_models.api_server import ChatResponse from nat.data_models.api_server import ChatResponseChunkAnd add a module-level logger:
_logger = logging.getLogger(__name__)src/nat/data_models/api_server.py (1)
985-1047: Tool conversion logic handles the key formats well.The flat-to-nested conversion for Responses API → Chat Completions tools is correct. Logging warnings for unsupported tool types (
code_interpreter,file_search, etc.) and excluding them is appropriate.One minor note:
_convert_toolsinstantiates a logger on every call (Lines 998-1000). Consider using a module-level logger.
| The server will start on port 8088 with the following endpoints: | ||
|
|
||
| | Endpoint | Format | Description | | ||
| |----------|--------|-------------| | ||
| | `/generate` | NAT default | Standard workflow endpoint | | ||
| | `/chat` | Chat Completions | OpenAI Chat Completions format | | ||
| | `/chat/stream` | Chat Completions | Streaming Chat Completions | | ||
| | `/v1/responses` | Responses API | OpenAI Responses API format | |
There was a problem hiding this comment.
Replace "NAT" acronym with "NeMo Agent Toolkit" in the endpoint table.
Line 63 uses "NAT default" as the format description. Per coding guidelines, spell out "NeMo Agent Toolkit" in Markdown documentation.
Suggested fix
-| `/generate` | NAT default | Standard workflow endpoint |
+| `/generate` | NeMo Agent Toolkit default | Standard workflow endpoint |As per coding guidelines: "Documentation in Markdown files should not use NAT as an acronym, always spell out NeMo Agent Toolkit."
🤖 Prompt for AI Agents
In `@examples/front_ends/responses_api_endpoint/README.md` around lines 59 - 66,
Update the endpoint table entry for the `/generate` row to replace the acronym
"NAT default" with the full phrase "NeMo Agent Toolkit default" (or "NeMo Agent
Toolkit") so the Format column spells out NeMo Agent Toolkit instead of using
NAT; locate the `/generate` table row in README.md and edit its Format cell
accordingly.
| class ResponsesInputItem(BaseModel): | ||
| """Input item for Responses API (message format).""" | ||
|
|
||
| type: typing.Literal["message"] = "message" | ||
| role: UserMessageContentRoleType = UserMessageContentRoleType.USER | ||
| content: str | list[dict[str, typing.Any]] = Field(description="Content of the message") | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n "ResponsesInputItem" --type=pyRepository: NVIDIA/NeMo-Agent-Toolkit
Length of output: 143
🏁 Script executed:
rg -n "ResponsesRequest" --type=py -A 10 | head -50Repository: NVIDIA/NeMo-Agent-Toolkit
Length of output: 5273
🏁 Script executed:
sed -n '903,950p' src/nat/data_models/api_server.pyRepository: NVIDIA/NeMo-Agent-Toolkit
Length of output: 2868
Remove unused ResponsesInputItem model.
ResponsesInputItem is defined but not referenced anywhere in the codebase. The ResponsesRequest.input field uses str | list[dict[str, typing.Any]] instead, making this model unnecessary (YAGNI).
🤖 Prompt for AI Agents
In `@src/nat/data_models/api_server.py` around lines 895 - 901, Delete the unused
ResponsesInputItem Pydantic model (class ResponsesInputItem) since
ResponsesRequest.input already uses the needed type; remove the class definition
and any now-unused imports (e.g., BaseModel, Field, UserMessageContentRoleType
if only referenced by that class) and run tests/type-checks to ensure no
remaining references to ResponsesInputItem exist.
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
src/nat/data_models/api_server.py (2)
949-984:to_chat_requestsilently discards non-dict items and unknown roles.When
self.inputis a list, non-dict items are silently skipped (theif isinstance(item, dict)guard on line 963), and invalid role strings fall back toUSERwithout logging. For a compatibility layer, this is defensively reasonable, but a debug/warning log on these fallback paths would aid troubleshooting integration issues.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nat/data_models/api_server.py` around lines 949 - 984, The to_chat_request method currently drops non-dict items in self.input and silently maps unknown role strings to UserMessageContentRoleType.USER; update to emit warning/debug logs when these fallbacks occur so callers can trace malformed inputs: inside to_chat_request (function name) log a warning when an item in self.input is not a dict and is skipped, and log a warning when constructing role = UserMessageContentRoleType(role_str) falls back to USER due to ValueError (include the offending role_str and the original item content); use the module or class logger (or Python logging) consistently so Message creation and downstream processing show context for these warnings.
998-1001: Avoid re-creating logger on every call to_convert_tools.
_logging.getLogger(__name__)is called inside the method body each time. Move the import and logger to module level — the module already importslogging-adjacent utilities and the logger would be reusable across methods.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nat/data_models/api_server.py` around lines 998 - 1001, The _convert_tools function currently imports logging and calls _logging.getLogger(__name__) on every invocation; move the import and logger creation to module scope by adding a single module-level "import logging" (or reuse existing logging import) and define _logger = logging.getLogger(__name__) once at top-level, then remove the inline "import logging as _logging" and "_logger = _logging.getLogger(__name__)" from inside _convert_tools and use the module-level _logger instead.src/nat/front_ends/fastapi/response_helpers.py (2)
226-233: Move imports and logger to module level.
json,logging,secrets,ChatResponse, andChatResponseChunkare unconditionally used whenever this function runs. Placing them inside the function body adds per-call overhead and deviates from the module-level import convention used elsewhere in this file (e.g., lines 16–27).Suggested fix
Move these to the top of the file alongside existing imports:
import asyncio +import json +import logging +import secrets import typing from collections.abc import AsyncGenerator +from nat.data_models.api_server import ChatResponse +from nat.data_models.api_server import ChatResponseChunk from nat.data_models.api_server import ResponseIntermediateStepAnd replace the in-function logger with a module-level one:
+_logger = logging.getLogger(__name__)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nat/front_ends/fastapi/response_helpers.py` around lines 226 - 233, Move the in-function imports and logger to module level: remove the local imports of json, logging, secrets and the data model imports ChatResponse and ChatResponseChunk from inside the function and add them to the top of the module with the other imports; also create a module-level logger named _logger = logging.getLogger(__name__) and replace the function's local logger usage with this module-level _logger so the function uses the top-level json, secrets, ChatResponse, ChatResponseChunk, and _logger symbols.
300-328: Non-streaming path silently swallows responses with empty/Nonecontent.When the workflow is non-streaming (line 317–328), if
result.choicesis empty orresult.choices[0].messageisNone, no delta event is emitted, andaccumulated_textstays"". The completion events will then report an empty output with no indication that something may have gone wrong. Consider logging a warning when the non-streaming result yields no content, so silent failures don't go unnoticed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nat/front_ends/fastapi/response_helpers.py` around lines 300 - 328, In the non-streaming branch inside the async with session.run(payload) block (where you await runner.result(to_type=ChatResponse) and set accumulated_text), add a warning log when result.choices is empty or result.choices[0].message is None so we don't silently emit an empty output; specifically, detect the missing/None content after obtaining result (the same place you currently set accumulated_text) and call the module’s logger (or processLogger/response logger used elsewhere) to warn with context including output_index and content_index and any available metadata from result, ensuring you still emit an empty delta only if appropriate and not swallow the condition silently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/nat/data_models/api_server.py`:
- Around line 1009-1013: The loop that builds converted from self.tools
currently appends non-dict tool objects as-is, violating the method's return
type list[dict[str, typing.Any]] | None; update the loop that constructs
converted so that when not isinstance(tool, dict) you either (a) normalize the
object to a dict (e.g., use tool.__dict__ or dataclasses.asdict for dataclass
instances, or call a to_dict()/as_dict() method if present) before appending, or
(b) skip the unsupported tool and emit a warning/log entry; ensure the branch
references the same symbols (self.tools, converted, tool) and that the final
returned list contains only dict[str, typing.Any].
In `@src/nat/front_ends/fastapi/response_helpers.py`:
- Around line 235-236: The code generates response IDs with the nonstandard
"rsp_" prefix; update the string prefixes to "resp_" where response_id and any
similar generator use f"rsp_" so they match the OpenAI Responses API convention
(e.g., change response_id = f"rsp_{secrets.token_hex(30)}"[:64] to use "resp_");
also update the matching occurrence and its docstring in the API server data
model (the docstring that references the Responses API and any variable named
response_id or resp-like generator) so the created IDs and documentation
consistently use "resp_".
---
Duplicate comments:
In `@src/nat/data_models/api_server.py`:
- Around line 1105-1110: The expressions that build the ResponsesUsage object
suffer from operator precedence: e.g.
input_tokens=chat_response.usage.prompt_tokens or 0 if chat_response.usage else
0 is parsed incorrectly; update the construction in the usage assignment so each
conditional is parenthesized, e.g. set
input_tokens=(chat_response.usage.prompt_tokens or 0) if chat_response.usage
else 0 (and similarly for output_tokens and total_tokens) to ensure the "or 0"
applies to the token value before the if/else; touches the usage variable
assignment that constructs ResponsesUsage from chat_response.usage.
- Around line 896-901: ResponsesInputItem is defined but never used; update the
Responses API types to either remove the unused model or wire it into the
request type—specifically replace the loose type on ResponsesRequest.input with
ResponsesInputItem (or make ResponsesRequest.input: str |
list[ResponsesInputItem] if strings are still allowed) so the new BaseModel is
referenced, and remove ResponsesInputItem if you choose not to use it; adjust
any imports/usages accordingly (look for ResponsesInputItem and
ResponsesRequest.input in this module).
In `@src/nat/front_ends/fastapi/response_helpers.py`:
- Around line 329-333: The code is leaking internal exception details to clients
by yielding str(workflow_error) from the except block; instead, change the
yielded client-facing error to a generic, non-sensitive message (e.g., "Internal
error during Responses API workflow") while keeping the full exception in server
logs via _logger.exception; update the yield call to pass the generic message to
_error_event and optionally include a safe error_id or correlation id, and leave
workflow_error only for logging rather than sending to clients.
- Around line 199-205: The generate_responses_api_streaming function currently
declares an unused parameter step_adaptor with a mutable default
StepAdaptor(StepAdaptorConfig()), so remove the unused parameter or accept it as
an optional parameter and construct it inside the function to avoid
mutable-defaults; specifically, change the signature for
generate_responses_api_streaming to take step_adaptor: Optional[StepAdaptor] =
None (or remove it entirely if not used), and then inside the function set
step_adaptor = step_adaptor or StepAdaptor(StepAdaptorConfig()) (or drop all
references if removed), ensuring references to StepAdaptor and StepAdaptorConfig
are updated accordingly to eliminate the Ruff ARG001 and B008 warnings.
- Around line 393-396: The outer except block leaks raw error text to clients by
yielding _error_event(str(setup_error), "server_error"); instead, keep detailed
info only in logs and send a sanitized message or an error id to the client.
Replace the yield call in the except Exception as setup_error handler so it
yields a generic/sanitized message (e.g. "internal server error" or a generated
error_id) via _error_event while retaining _logger.exception("Error during
Responses API streaming setup") to record full details; ensure references to
_error_event, setup_error and _logger.exception are updated accordingly.
---
Nitpick comments:
In `@src/nat/data_models/api_server.py`:
- Around line 949-984: The to_chat_request method currently drops non-dict items
in self.input and silently maps unknown role strings to
UserMessageContentRoleType.USER; update to emit warning/debug logs when these
fallbacks occur so callers can trace malformed inputs: inside to_chat_request
(function name) log a warning when an item in self.input is not a dict and is
skipped, and log a warning when constructing role =
UserMessageContentRoleType(role_str) falls back to USER due to ValueError
(include the offending role_str and the original item content); use the module
or class logger (or Python logging) consistently so Message creation and
downstream processing show context for these warnings.
- Around line 998-1001: The _convert_tools function currently imports logging
and calls _logging.getLogger(__name__) on every invocation; move the import and
logger creation to module scope by adding a single module-level "import logging"
(or reuse existing logging import) and define _logger =
logging.getLogger(__name__) once at top-level, then remove the inline "import
logging as _logging" and "_logger = _logging.getLogger(__name__)" from inside
_convert_tools and use the module-level _logger instead.
In `@src/nat/front_ends/fastapi/response_helpers.py`:
- Around line 226-233: Move the in-function imports and logger to module level:
remove the local imports of json, logging, secrets and the data model imports
ChatResponse and ChatResponseChunk from inside the function and add them to the
top of the module with the other imports; also create a module-level logger
named _logger = logging.getLogger(__name__) and replace the function's local
logger usage with this module-level _logger so the function uses the top-level
json, secrets, ChatResponse, ChatResponseChunk, and _logger symbols.
- Around line 300-328: In the non-streaming branch inside the async with
session.run(payload) block (where you await runner.result(to_type=ChatResponse)
and set accumulated_text), add a warning log when result.choices is empty or
result.choices[0].message is None so we don't silently emit an empty output;
specifically, detect the missing/None content after obtaining result (the same
place you currently set accumulated_text) and call the module’s logger (or
processLogger/response logger used elsewhere) to warn with context including
output_index and content_index and any available metadata from result, ensuring
you still emit an empty delta only if appropriate and not swallow the condition
silently.
| converted = [] | ||
| for tool in self.tools: | ||
| if not isinstance(tool, dict): | ||
| converted.append(tool) | ||
| continue |
There was a problem hiding this comment.
Non-dict tool appended without conversion violates return type.
Line 1012 appends tool as-is when not isinstance(tool, dict), but the method's return type is list[dict[str, typing.Any]] | None. If a non-dict tool somehow appears in the input, this silently produces a type-incorrect result. Either validate/skip with a warning (consistent with unsupported types) or convert to dict.
Suggested fix
if not isinstance(tool, dict):
- converted.append(tool)
+ _logger.warning("Unexpected non-dict tool entry encountered, skipping: %s", type(tool))
continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/nat/data_models/api_server.py` around lines 1009 - 1013, The loop that
builds converted from self.tools currently appends non-dict tool objects as-is,
violating the method's return type list[dict[str, typing.Any]] | None; update
the loop that constructs converted so that when not isinstance(tool, dict) you
either (a) normalize the object to a dict (e.g., use tool.__dict__ or
dataclasses.asdict for dataclass instances, or call a to_dict()/as_dict() method
if present) before appending, or (b) skip the unsupported tool and emit a
warning/log entry; ensure the branch references the same symbols (self.tools,
converted, tool) and that the final returned list contains only dict[str,
typing.Any].
| response_id = f"rsp_{secrets.token_hex(30)}"[:64] | ||
| message_id = f"msg_{secrets.token_hex(30)}"[:64] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find and inspect the response_helpers.py file at lines 235-236
fd response_helpers.pyRepository: NVIDIA/NeMo-Agent-Toolkit
Length of output: 115
🏁 Script executed:
# Find and inspect the api_server.py file
fd api_server.pyRepository: NVIDIA/NeMo-Agent-Toolkit
Length of output: 146
🌐 Web query:
OpenAI Responses API response ID prefix format convention
💡 Result:
In the OpenAI Responses API, the top-level Response object’s id uses the prefix:
resp_...(e.g.,resp_67ccd2bed1ec8190...) [1][2]
The part after resp_ is an opaque identifier (you shouldn’t assume any structure beyond it being a unique ID string). [1][2]
🏁 Script executed:
# Read response_helpers.py around lines 235-236
sed -n '230,240p' src/nat/front_ends/fastapi/response_helpers.pyRepository: NVIDIA/NeMo-Agent-Toolkit
Length of output: 432
🏁 Script executed:
# Read api_server.py around line 1083
sed -n '1078,1088p' src/nat/data_models/api_server.pyRepository: NVIDIA/NeMo-Agent-Toolkit
Length of output: 578
🏁 Script executed:
# Search for all occurrences of rsp_ and resp_ prefixes in the codebase
rg "rsp_|resp_" --type py -nRepository: NVIDIA/NeMo-Agent-Toolkit
Length of output: 929
Change rsp_ prefix to resp_ to match OpenAI Responses API convention.
OpenAI's Responses API uses the resp_ prefix for response IDs (e.g., resp_67ccd2bed1ec8190...). Update both locations:
src/nat/front_ends/fastapi/response_helpers.pyline 235:f"rsp_"→f"resp_"src/nat/data_models/api_server.pyline 1083:f"rsp_"→f"resp_"(also update the docstring which correctly references the OpenAI Responses API)
This mismatch could cause compatibility issues with OpenAI clients or platforms expecting the standard prefix format.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/nat/front_ends/fastapi/response_helpers.py` around lines 235 - 236, The
code generates response IDs with the nonstandard "rsp_" prefix; update the
string prefixes to "resp_" where response_id and any similar generator use
f"rsp_" so they match the OpenAI Responses API convention (e.g., change
response_id = f"rsp_{secrets.token_hex(30)}"[:64] to use "resp_"); also update
the matching occurrence and its docstring in the API server data model (the
docstring that references the Responses API and any variable named response_id
or resp-like generator) so the created IDs and documentation consistently use
"resp_".
Description
Closes
By Submitting this PR I confirm:
The following items are addressed in this PR:
Summary by CodeRabbit
New Features
Documentation
Tests