Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces an experimental Realtime Client / Session abstraction for Microsoft.Extensions.AI, including middleware-style session pipelines (logging, OpenTelemetry, function invocation) and an initial OpenAI realtime provider, while refactoring function-invocation logic to be shared across chat and realtime flows.
Changes:
- Add
IRealtimeClient/IRealtimeSessionabstractions plus realtime message/option types (audio, transcription, response items, errors, etc.). - Add
RealtimeSessionBuilderpipeline + middleware implementations (LoggingRealtimeSession,OpenTelemetryRealtimeSession,FunctionInvokingRealtimeSession). - Refactor shared function invocation into reusable internal components (
FunctionInvocationProcessor, helpers, logger), used by both chat and realtime.
Reviewed changes
Copilot reviewed 62 out of 63 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| test/Libraries/Microsoft.Extensions.AI.Tests/Realtime/RealtimeSessionExtensionsTests.cs | Unit tests for IRealtimeSession.GetService<T>() extension behavior. |
| test/Libraries/Microsoft.Extensions.AI.Tests/Realtime/RealtimeSessionBuilderTests.cs | Unit tests for RealtimeSessionBuilder pipeline behavior and ordering. |
| test/Libraries/Microsoft.Extensions.AI.Tests/Realtime/LoggingRealtimeSessionTests.cs | Unit tests validating logging middleware behavior across methods and log levels. |
| test/Libraries/Microsoft.Extensions.AI.Tests/Realtime/FunctionInvokingRealtimeSessionTests.cs | Unit tests for function invocation behavior in realtime streaming. |
| test/Libraries/Microsoft.Extensions.AI.Tests/Realtime/DelegatingRealtimeSessionTests.cs | Unit tests for base delegating session behavior (delegation, disposal, services). |
| test/Libraries/Microsoft.Extensions.AI.Tests/Microsoft.Extensions.AI.Tests.csproj | Includes shared TestRealtimeSession in test compilation. |
| test/Libraries/Microsoft.Extensions.AI.OpenAI.Tests/OpenAIRealtimeSessionTests.cs | Unit tests for OpenAI realtime session basic behaviors and guardrails. |
| test/Libraries/Microsoft.Extensions.AI.OpenAI.Tests/OpenAIRealtimeClientTests.cs | Unit tests for OpenAI realtime client creation and service exposure. |
| test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/TestRealtimeSession.cs | Test double for IRealtimeSession with callback hooks. |
| test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Realtime/RealtimeSessionOptionsTests.cs | Tests for RealtimeSessionOptions and related option types. |
| test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Realtime/RealtimeServerMessageTests.cs | Tests for server message types and their property roundtrips. |
| test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Realtime/RealtimeContentItemTests.cs | Tests for RealtimeContentItem construction and mutation. |
| test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Realtime/RealtimeClientMessageTests.cs | Tests for client message types and their properties. |
| test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Realtime/RealtimeAudioFormatTests.cs | Tests for RealtimeAudioFormat behavior. |
| src/Libraries/Microsoft.Extensions.AI/Realtime/RealtimeSessionExtensions.cs | Adds GetService<T>() extension for IRealtimeSession. |
| src/Libraries/Microsoft.Extensions.AI/Realtime/RealtimeSessionBuilderRealtimeSessionExtensions.cs | Adds AsBuilder() extension for sessions. |
| src/Libraries/Microsoft.Extensions.AI/Realtime/RealtimeSessionBuilder.cs | Implements session middleware/pipeline builder. |
| src/Libraries/Microsoft.Extensions.AI/Realtime/OpenTelemetryRealtimeSessionBuilderExtensions.cs | Builder extension to add OpenTelemetry middleware to a realtime session. |
| src/Libraries/Microsoft.Extensions.AI/Realtime/LoggingRealtimeSessionBuilderExtensions.cs | Builder extension to add logging middleware to a realtime session. |
| src/Libraries/Microsoft.Extensions.AI/Realtime/LoggingRealtimeSession.cs | Delegating session middleware that logs calls and streaming messages. |
| src/Libraries/Microsoft.Extensions.AI/Realtime/FunctionInvokingRealtimeSessionBuilderExtensions.cs | Builder extension to add function invocation middleware. |
| src/Libraries/Microsoft.Extensions.AI/Realtime/FunctionInvokingRealtimeSession.cs | Implements tool/function invocation loop for realtime streaming. |
| src/Libraries/Microsoft.Extensions.AI/Realtime/AnonymousDelegatingRealtimeSession.cs | Anonymous delegate-based middleware for streaming interception. |
| src/Libraries/Microsoft.Extensions.AI/OpenTelemetryConsts.cs | Extends OpenTelemetry constants for realtime and token subcategories. |
| src/Libraries/Microsoft.Extensions.AI/Common/FunctionInvocationStatus.cs | Shared internal status enum for invocation outcomes. |
| src/Libraries/Microsoft.Extensions.AI/Common/FunctionInvocationProcessor.cs | Shared processor implementing serial/parallel invocation with instrumentation. |
| src/Libraries/Microsoft.Extensions.AI/Common/FunctionInvocationLogger.cs | Shared logger messages used by chat and realtime invocation flows. |
| src/Libraries/Microsoft.Extensions.AI/Common/FunctionInvocationHelpers.cs | Shared helpers (activity detection, elapsed time, tool map creation). |
| src/Libraries/Microsoft.Extensions.AI/ChatCompletion/FunctionInvokingChatClient.cs | Refactors chat function invocation to use shared processor/helpers/logger. |
| src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIRealtimeClient.cs | Adds OpenAI realtime client implementation that creates/initializes sessions. |
| src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIClientExtensions.cs | Adds AsIRealtimeClient extension for OpenAI client integration. |
| src/Libraries/Microsoft.Extensions.AI.OpenAI/Microsoft.Extensions.AI.OpenAI.csproj | Adds internals visibility for tests and Channels dependency (non-net10). |
| src/Libraries/Microsoft.Extensions.AI.Evaluation.Reporting/CSharp/Microsoft.Extensions.AI.Evaluation.Reporting.csproj | Comment formatting change. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/UsageDetails.cs | Adds realtime-specific token breakdown fields. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Tools/ToolChoiceMode.cs | Adds tool choice mode enum for realtime use. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/VoiceActivityDetection.cs | Adds VAD options type. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/TranscriptionOptions.cs | Adds transcription configuration type. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/ServerVoiceActivityDetection.cs | Adds server VAD settings. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/SemanticVoiceActivityDetection.cs | Adds semantic VAD settings. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeSessionOptions.cs | Adds session configuration options (audio formats, tools, tracing, etc.). |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeSessionKind.cs | Adds session kind enum (realtime vs transcription). |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerResponseOutputItemMessage.cs | Adds server message for output items. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerResponseCreatedMessage.cs | Adds server message for response lifecycle/usage metadata. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerOutputTextAudioMessage.cs | Adds server message for output text/audio streaming. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerMessageType.cs | Adds server message type enum. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerMessage.cs | Adds base server message type. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerInputAudioTranscriptionMessage.cs | Adds server transcription message type. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerErrorMessage.cs | Adds server error message type. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeContentItem.cs | Adds realtime conversation item wrapper. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeClientResponseCreateMessage.cs | Adds client response request message type (modalities/tools/etc.). |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeClientMessage.cs | Adds base client message type. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeClientInputAudioBufferCommitMessage.cs | Adds client message for committing audio input buffer. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeClientInputAudioBufferAppendMessage.cs | Adds client message for appending audio input buffer. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeClientConversationItemCreateMessage.cs | Adds client message for creating a conversation item. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeAudioFormat.cs | Adds audio format specification type. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/NoiseReductionOptions.cs | Adds noise reduction options enum. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/IRealtimeSession.cs | Adds realtime session interface. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/IRealtimeClient.cs | Adds realtime client interface. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/DelegatingRealtimeSession.cs | Adds base delegating session implementation. |
src/Libraries/Microsoft.Extensions.AI/Realtime/RealtimeSessionExtensions.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeSessionOptions.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerMessageType.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI/Realtime/LoggingRealtimeSession.cs
Show resolved
Hide resolved
...Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeClientInputAudioBufferCommitMessage.cs
Outdated
Show resolved
Hide resolved
...Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeClientInputAudioBufferAppendMessage.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerMessageType.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIClientExtensions.cs
Outdated
Show resolved
Hide resolved
shyamnamboodiripad
left a comment
There was a problem hiding this comment.
Signing off on behalf of eval (so that the whitespace change in Reporting.csproj does not block merge)
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The extension method on OpenAIClient was not useful because it completely ignored the OpenAIClient instance - only validating it for null before creating a new OpenAIRealtimeClient with the separately provided apiKey and model parameters. Users can construct OpenAIRealtimeClient directly instead.
- Fix RealtimeSessionExtensions XML doc to reference IRealtimeSession instead of IChatClient - Replace non-standard <ref name> tags with <see cref> in RealtimeServerMessageType.cs for proper IntelliSense/doc rendering - Fix ResponseDone doc summary to say 'completed' instead of 'created' - Add missing Throw.IfNull(updates) in LoggingRealtimeSession .GetStreamingResponseAsync for consistency with other sessions
- Split RealtimeServerMessageType enum: add ResponseOutputItemDone and ResponseOutputItemAdded to distinguish per-item events (response.output_item.done, conversation.item.done) from whole-response events (response.done, response.created) - Fix function result serialization: use JsonSerializer.Serialize() instead of ToString() to properly serialize complex objects - Fix OTel streaming duration: start stopwatch at method entry instead of immediately before recording, so duration histogram measures actual streaming time - URL-encode model name in WebSocket URI for defensive safety - Fix OTel metadata tag ordering: apply user metadata before standard tags so standard OTel attributes take precedence if keys collide
| /// <summary> | ||
| /// Gets the current session options. | ||
| /// </summary> | ||
| RealtimeSessionOptions? Options { get; } |
There was a problem hiding this comment.
I'm unclear as to the semantic of this. RealtimeSessionOptions is a mutable class. If I start setting properties on that while the session is active, is that going to result in immediate changes in behavior?
There was a problem hiding this comment.
No, the abstraction includes an Update Session operation that must be called to update the session. I need a type to use when updating the session (it must be a writable object), and I also want to expose the same information to anyone requesting it at any time, in that case, it can be read-only.
The reason is that, in the middleware layer, I need access to the session properties. OpenAI models allow updating the session after it has been created, but I believe not all providers allow that. Therefore, in most scenarios, I expect that once the session is created with the desired configuration, it will not change much afterward.
I am trying to avoid having two types for that. Do you have a better idea handling that?
There was a problem hiding this comment.
Maybe we can make the session options using init instead of setters. This will make the object immutable. I believe this will solve the confusion and will be a clearer design. I'll try that and see how it goes.
| /// <remarks> | ||
| /// This method allows for the injection of client messages into the session at any time, which can be used to influence the session's behavior or state. | ||
| /// </remarks> | ||
| Task InjectClientMessageAsync(RealtimeClientMessage message, CancellationToken cancellationToken = default); |
There was a problem hiding this comment.
I'm not sure about the word "Inject"... is that standard terminology used by the providers? Is this just "Send"? How does this relate to GetStreamingResponseAsync... is this only valid when someone is actively enumerating?
| /// <param name="cancellationToken">A token to cancel the operation.</param> | ||
| /// <returns>The response messages generated by the session.</returns> | ||
| /// <remarks> | ||
| /// This method cannot be called multiple times concurrently on the same session instance. |
There was a problem hiding this comment.
Should the session itself be enumerable?
| /// <summary> | ||
| /// For far-field microphones. | ||
| /// </summary> | ||
| FarField |
There was a problem hiding this comment.
Do any providers have the notion of "Auto"?
| /// <summary> | ||
| /// Gets or sets the type of audio. For example, "audio/pcm". | ||
| /// </summary> | ||
| public string Type { get; set; } |
There was a problem hiding this comment.
Is this mime/media type? I believe we spell that out as MediaType elsewhere, like in DataContent
src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIRealtimeSession.cs
Outdated
Show resolved
Hide resolved
| await _sendLock.WaitAsync(cancellationToken).ConfigureAwait(false); | ||
| lockTaken = true; | ||
|
|
||
| await _webSocket.SendAsync( |
There was a problem hiding this comment.
Oh! This isn't using the OpenAI library's realtime support? Why not?
There was a problem hiding this comment.
Somehow, I had the impression of not taking a dependency on third-party libraries. Looks like I was wrong 🥹. I'll look at that and update. Thanks!
There was a problem hiding this comment.
I looked at OpenAI SDK, looks the latest package 2.8.0 doesn't have the updates for the Realtime model. I am seeing they have merged the PR openai/openai-dotnet#928 two days ago. I think we need to wait them publish a new version then we can consume it. I'll try to watch that.
src/Libraries/Microsoft.Extensions.AI/ChatCompletion/FunctionInvokingChatClient.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI/ChatCompletion/FunctionInvokingChatClient.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI/Common/FunctionInvocationStatus.cs
Outdated
Show resolved
Hide resolved
81300f4 to
8ad70f5
Compare
…k in OpenAIRealtimeSession
8ad70f5 to
fbdc7cb
Compare
- Move TranscriptionOptions from Realtime/ to SpeechToText/ folder - Change experimental flag from AIRealTime to AISpeechToText - Make properties nullable with parameterless constructor - Rename Language to SpeechLanguage, Model to ModelId - Replace SpeechToTextOptions.ModelId and .SpeechLanguage with Transcription property - Update all consumers and tests
Realtime Client Proposal
HostedMcpServerToolis in place, it has not been validated end-to-end and may change significantly.Overview
This PR introduces a Realtime Client abstraction layer for
Microsoft.Extensions.AI, enabling bidirectional, streaming communication with realtime AI services (e.g., OpenAI's Realtime API). The design follows the same middleware/pipeline patterns established byIChatClientand extends them to realtime sessions over WebSocket connections.Key changes include:
IRealtimeClient,IRealtimeSession,DelegatingRealtimeSession) inMicrosoft.Extensions.AI.AbstractionsRealtimeSessionOptionssupporting audio formats, voice activity detection, transcription, tools, and tracingRealtimeSessionBuilderwith built-in support for:LoggingRealtimeSession)OpenTelemetryRealtimeSession) following GenAI semantic conventionsFunctionInvokingRealtimeSession) with automatic tool call resolutionOpenAIRealtimeClient,OpenAIRealtimeSession) using WebSocket connectionsFunctionInvokingChatClientinto reusable components (FunctionInvocationProcessor,FunctionInvocationHelpers,FunctionInvocationLogger) so both chat and realtime sessions share the same invocation pipelineFiles changed: 63 (11,231 insertions, 319 deletions)
Supported Realtime Messages
Client Messages (sent to the server)
RealtimeClientConversationItemCreateMessageRealtimeClientInputAudioBufferAppendMessageRealtimeClientInputAudioBufferCommitMessageRealtimeClientResponseCreateMessageServer Messages (received from the server)
RealtimeServerOutputTextAudioMessageRealtimeServerInputAudioTranscriptionMessageRealtimeServerResponseCreatedMessageRealtimeServerResponseOutputItemMessageRealtimeServerErrorMessageServer Message Types (
RealtimeServerMessageTypeenum)RawContentOnlyOutputTextDeltaOutputTextDoneOutputAudioDeltaOutputAudioDoneOutputAudioTranscriptionDeltaOutputAudioTranscriptionDoneInputAudioTranscriptionDeltaInputAudioTranscriptionCompletedInputAudioTranscriptionFailedResponseCreatedResponseDoneErrorMcpCallInProgressMcpCallCompletedMcpCallFailedMcpListToolsInProgressMcpListToolsCompletedMcpListToolsFailedUsage Examples
1. Creating a Realtime Client
2. Creating a Session
3. Enabling Middlewares (Logging, OpenTelemetry, Function Invocation)
Use
RealtimeSessionBuilderto compose a middleware pipeline around the session:4. Configuring the Session
5. Sending Client Messages to the Server
Use a
Channel<RealtimeClientMessage>to send messages asynchronously:6. Listening to Server Messages
Call
GetStreamingResponseAsyncto consume server messages as they arrive:7. Ending the Session
Microsoft Reviewers: Open in CodeFlow
Demo Application
A complete application consuming the new realtime interfaces can be found at: RealtimeProposalDemoApp