Add native Anthropic, OpenAI, and Gemini SDK support#3
Open
anassg-lago wants to merge 5 commits into
Open
Conversation
- src/lago_agent_sdk/adapters/anthropic_native.py — extract_anthropic_native - src/lago_agent_sdk/wrappers/anthropic.py — wraps messages.create (sync + async, streaming and non-streaming) and messages.stream context manager - Wired into sdk.wrap() dispatch and adapters/__init__.py exports - anthropic = ["anthropic>=0.30"] optional-dep group - 19 new unit tests + 3 live integration tests; 256 unit tests pass - Coverage 80.71% — gate maintained - 9 captured response fixtures from real Anthropic API - README + CHANGELOG updated
The test set max_batch_size == max_buffer_size == 100, which caused the push that brings the buffer to 100 to trigger a wake on the background worker. The worker would take a batch (emptying the buffer) and then race with the remaining 150 pushes to call slow_sender. On CI's slower runners the worker sometimes squeezed in additional batches before slow_sender finally blocked, leaving the buffer with fewer items than the expected sliding window. Setting max_batch_size > max_buffer_size guarantees push() never sets the wake event (buffer can never reach max_batch_size). Combined with a long flush_interval the worker only runs once shutdown() releases the pause in the finally block — fully deterministic. Verified with 5 consecutive runs.
Adapter handles both API shapes with auto-detection:
Chat Completions (client.chat.completions.create):
usage.prompt_tokens -> input
usage.completion_tokens -> output
usage.prompt_tokens_details.cached_tokens -> cache_read
usage.prompt_tokens_details.audio_tokens -> audio_input
usage.completion_tokens_details.reasoning_tokens -> reasoning (o-series)
usage.completion_tokens_details.audio_tokens -> audio_output
count of choices[0].message.tool_calls -> tool_calls
Responses API (client.responses.create):
usage.input_tokens -> input
usage.output_tokens -> output
usage.input_tokens_details.cached_tokens -> cache_read
usage.output_tokens_details.reasoning_tokens -> reasoning
count of output[].type == "function_call" -> tool_calls
Wrapper covers both methods, sync + streaming, on both OpenAI and AsyncOpenAI.
For Chat Completions streaming, auto-injects stream_options.include_usage=true
when missing so the final chunk carries usage data (without that flag, OpenAI
emits no usage on streamed responses).
CanonicalUsage extended with audio_output (mapped to llm_audio_output_tokens)
to capture GPT-4o-audio output usage.
OpenAI is the first provider to actually populate llm_reasoning_tokens
(o-series surfaces reasoning tokens separately; Anthropic/Bedrock fold them
into output_tokens).
Predicted Outputs tokens (accepted/rejected_prediction_tokens) are
intentionally not surfaced -- documented in the adapter docstring as a
v1 gap.
27 new unit tests (18 adapter + 9 wrapper). 5 live integration tests gated
on OPENAI_API_KEY. 10 captured response fixtures from the real OpenAI API.
Total: 283 unit tests passing, ruff + mypy strict clean.
Adapter maps usage_metadata fields to CanonicalUsage: prompt_token_count -> input candidates_token_count -> output cached_content_token_count -> cache_read thoughts_token_count -> reasoning prompt_tokens_details[modality=AUDIO].token_count -> audio_input prompt_tokens_details[modality=IMAGE].token_count -> image_input candidates_tokens_details[modality=AUDIO].token_count -> audio_output count of candidates[0].content.parts[].function_call -> tool_calls Wrapper covers client.models.generate_content + generate_content_stream (sync) and the async variants under client.aio.models. Idempotent via _lago_instrumented sentinel. Detector now returns 'gemini' (was 'google') for google-genai clients -- matches the naming convention used by other providers (bedrock, anthropic, openai, mistral). Semantic note vs OpenAI: Gemini's `thoughts_token_count` is ADDITIVE to `candidates_token_count` (verified by math across all 5 fixtures: input + output + reasoning = total). OpenAI's `reasoning_tokens` is a SUBSET of `completion_tokens`. Documented in adapter docstring + README for customers configuring per-metric billing. Gemini 2.5 emits reasoning tokens by default (no explicit thinking_config needed) -- second provider populating llm_reasoning_tokens. 21 new unit tests (15 adapter + 6 wrapper). 4 live integration tests gated on GEMINI_API_KEY. 5 captured response fixtures (plain, tool use, streaming, thinking, multi-turn). Total: 304 unit tests passing, ruff + mypy strict clean.
CI runs `ruff format --check` which was failing because earlier dev only ran `ruff check` (linter) locally, not the formatter. Auto-formatting restores whitespace consistency in: - src/lago_agent_sdk/adapters/gemini_native.py - src/lago_agent_sdk/wrappers/openai.py - tests/unit/adapters/fixtures/capture_openai.py - tests/unit/adapters/test_gemini_native.py - tests/unit/test_wrapper_gemini.py No functional changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds three new native LLM provider integrations to the SDK, bringing total coverage to five (Bedrock + Mistral + Anthropic + OpenAI + Gemini). Plus a canonical-schema extension for audio-output tokens and a flaky-test fix.
Commits in this PR
c23e692test_repeated_overflow_keeps_window_sliding(race condition withmax_batch_size == max_buffer_size)8da7b82messages.createsync + stream,messages.streamcontext manager)0f79c5a6c487abgoogle-genai, sync + stream + async viaclient.aio.models)Provider coverage matrix
Key design decisions
audio_output→llm_audio_output_tokensmetric code. Populated by GPT-4o-audio output and Gemini TTS responses.gemini(wasgoogle) forgoogle-genaiclients — keeps naming consistent withbedrock/anthropic/openai/mistral.stream_options.include_usage=Truewhen missing. Without it, OpenAI emits no usage on streamed responses — silent under-billing.prompt_tokensvsinput_tokens).reasoning_tokensis a SUBSET ofcompletion_tokens; Gemini'sthoughtsTokenCountis ADDITIVE tocandidatesTokenCount. Customers configuring per-metric billing must account for this. Documented in adapter docstrings + README.Tests
Quality gate
ruff check src tests— cleanmypy --strict src— 0 issues in 21 source filespytest tests/unit— 304/304 passKnown gaps (intentional, documented)
accepted_/rejected_prediction_tokens) — not surfaced; would risk double-counting againstcompletion_tokensTest plan
audio_output)kind == "google"consumers (none today)pytest tests/unit -q)