Skip to content

Python: Enhance Azure AI Search Citations with Document URLs in Foundry V2#4028

Open
giles17 wants to merge 4 commits intomicrosoft:mainfrom
giles17:ai_search_v2_citations
Open

Python: Enhance Azure AI Search Citations with Document URLs in Foundry V2#4028
giles17 wants to merge 4 commits intomicrosoft:mainfrom
giles17:ai_search_v2_citations

Conversation

@giles17
Copy link
Contributor

@giles17 giles17 commented Feb 18, 2026

Summary

Enriches Azure AI Search url_citation annotations with per-document REST API URLs (get_url) for the Foundry V2 (Responses API) path. Previously, url_citation annotations only contained the search service base URL, making it hard for users to identify which specific document was referenced.

This is the V2 counterpart to PR #2066 which solved the same problem for Foundry V1 (Assistants API).

Resolves #2496

Problem

When using Azure AI Search as a tool with the Responses API, the url_citation annotation in the assistant's response only contains the search service base URL (e.g., https://search.example.com/), not the document-specific URL. The actual per-document URLs exist in the azure_ai_search_call_output response items under output.get_urls[].

Solution

Override _inner_get_response in RawAzureAIClient to post-process both streaming and non-streaming responses:

Non-streaming: Wraps the awaitable to extract get_urls from raw_representation.output after the base class parses the response, then enriches citation annotations.

Streaming: Registers a transform hook on ResponseStream that:

  1. Captures get_urls from azure_ai_search_call_output items (via response.output_item.added and response.output_item.done events — the data is only fully populated in the done event)
  2. Handles url_citation annotations (which the base class doesn't handle in streaming) by creating proper Content objects with enriched Annotation entries
  3. Uses closure-local state instead of instance state, so concurrent streams don't interfere

How users access citations

# Non-streaming
result = await agent.run(query)
for msg in result.messages:
    for content in msg.contents:
        for ann in (content.annotations or []):
            if ann.get("type") == "citation":
                doc_url = ann.get("additional_properties", {}).get("get_url")
                print(f"Document URL: {doc_url}")

# Streaming
async for chunk in agent.run(query, stream=True):
    for content in (chunk.contents or []):
        for ann in (content.annotations or []):
            if ann.get("type") == "citation":
                doc_url = ann.get("additional_properties", {}).get("get_url")
                print(f"Document URL: {doc_url}")

Changes

  • packages/azure-ai/agent_framework_azure_ai/_client.py — Added _inner_get_response override, helper methods (_extract_azure_search_urls, _get_search_doc_url, _enrich_annotations_with_search_urls, _build_url_citation_content)
  • packages/azure-ai/tests/test_azure_ai_client.py — Added 14 unit tests covering helpers, non-streaming enrichment, streaming hook registration, streaming URL capture and annotation enrichment
  • samples/02-agents/providers/azure_ai/azure_ai_with_azure_ai_search.py — Updated sample demonstrating citation extraction for both streaming and non-streaming

…ry V2 (Responses API)

Override _parse_response_from_openai and _parse_chunk_from_openai in
RawAzureAIClient to extract get_urls from azure_ai_search_call_output
items and enrich url_citation annotations with document-specific URLs.

- Non-streaming: first pass collects get_urls, post-processes annotations
- Streaming: captures search output state, enriches url_citation events
  (also handles url_citation annotation type not handled by base class)
- Updated V2 sample to demonstrate citation URL extraction
- Added 14 unit tests covering extraction, enrichment, and edge cases

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 18, 2026 04:33
@giles17 giles17 added the python label Feb 18, 2026
@giles17 giles17 changed the title Python: Enhance Azure AI Search Citations with Document URLs in Foundry V2 (Responses API) Python: Enhance Azure AI Search Citations with Document URLs in Foundry V2 Feb 18, 2026
@giles17 giles17 marked this pull request as draft February 18, 2026 04:35
@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Feb 18, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/azure-ai/agent_framework_azure_ai
   _client.py3804288%386, 388, 431, 439–451, 464, 524, 539–544, 587, 622, 624, 689, 692, 694, 795, 830, 870, 1078, 1081, 1084–1085, 1087–1090, 1133
TOTAL21336330884% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
4199 239 💤 0 ❌ 0 🔥 1m 19s ⏱️

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enhances the Azure AI (Foundry V2 / Responses API) integration to surface Azure AI Search per-document REST URLs (get_url) on citation annotations, for both non-streaming and streaming responses.

Changes:

  • Added extraction/mapping/enrichment utilities in the Azure AI Responses client to attach get_url to citation annotations.
  • Overrode non-streaming and streaming parsing to capture Azure AI Search get_urls and enrich citations.
  • Updated the Azure AI Search sample and added a dedicated set of unit tests for this behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
python/samples/02-agents/providers/azure_ai/azure_ai_with_azure_ai_search.py Demonstrates reading enriched citation additional_properties.get_url for non-streaming and streaming runs.
python/packages/azure-ai/agent_framework_azure_ai/_client.py Implements extraction of get_urls from azure_ai_search_call_output items and citation enrichment in response parsing (including streaming).
python/packages/azure-ai/tests/test_azure_ai_client.py Adds unit tests for URL extraction, mapping, enrichment, and streaming behavior/state handling.

giles17 and others added 3 commits February 19, 2026 11:37
…sponse

- Remove all direct openai/pydantic imports from _client.py
- Override _inner_get_response instead of _parse_response_from_openai/_parse_chunk_from_openai
- Use closure-local state for streaming instead of instance-level _streaming_search_get_urls
- Add _build_url_citation_content helper for streaming url_citation handling
- Fix mypy errors by using str(value or '') for Annotation TypedDict fields
- Fix docstring to say 'citation' instead of 'url_citation'
- Update tests to match new approach

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The azure_ai_search_call_output item only has populated output data
(including get_urls) in the response.output_item.done event, not in
the response.output_item.added event. Also removed the search_get_urls
guard on url_citation handling so annotations are always produced even
if get_urls haven't been captured yet.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@giles17 giles17 marked this pull request as ready for review February 19, 2026 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: How to access citation data in the Azure AI Search Python sample

2 participants

Comments