Add native Anthropic, OpenAI, and Gemini SDK support by anassg-lago · Pull Request #3 · getlago/lago-agent-sdk-python

anassg-lago · 2026-05-29T12:38:44Z

Summary

Adds three new native LLM provider integrations to the SDK, bringing total coverage to five (Bedrock + Mistral + Anthropic + OpenAI + Gemini). Plus a canonical-schema extension for audio-output tokens and a flaky-test fix.

Commits in this PR

Commit	Summary
`c23e692`	Fix flaky `test_repeated_overflow_keeps_window_sliding` (race condition with `max_batch_size == max_buffer_size`)
`8da7b82`	Native Anthropic SDK support (`messages.create` sync + stream, `messages.stream` context manager)
`0f79c5a`	Native OpenAI SDK support (Chat Completions + Responses API, sync + stream + async)
`6c487ab`	Native Gemini SDK support (`google-genai`, sync + stream + async via `client.aio.models`)

Provider coverage matrix

Provider	Sync	Stream	Async	Tool calls	Reasoning	Cache	Multimodal
Anthropic native	✓	✓	✓	✓	✗ (folded)	✓ (5m + 1h)	✗
OpenAI native	✓	✓	✓	✓	✓ (subset of output)	✓ (auto)	✓ (audio)
Gemini native	✓	✓	✓	✓	✓ (additive to output)	✓ (CachedContent API)	✓ (audio + image, per modality)

Key design decisions

CanonicalUsage extended with audio_output → llm_audio_output_tokens metric code. Populated by GPT-4o-audio output and Gemini TTS responses.
Detector now returns gemini (was google) for google-genai clients — keeps naming consistent with bedrock/anthropic/openai/mistral.
OpenAI streaming auto-injects stream_options.include_usage=True when missing. Without it, OpenAI emits no usage on streamed responses — silent under-billing.
OpenAI adapter auto-detects Chat Completions vs Responses API by usage-field shape (prompt_tokens vs input_tokens).
Reasoning-tokens semantic difference documented: OpenAI's reasoning_tokens is a SUBSET of completion_tokens; Gemini's thoughtsTokenCount is ADDITIVE to candidatesTokenCount. Customers configuring per-metric billing must account for this. Documented in adapter docstrings + README.

Tests

283 → 304 unit tests (+19 Anthropic +18 OpenAI adapter +9 OpenAI wrapper +15 Gemini adapter +6 Gemini wrapper)
Live integration tests (skipped without API keys): 3 Anthropic + 5 OpenAI + 4 Gemini — all pass against the real APIs with a mock Lago HTTP server
24 captured response fixtures from real APIs (9 Anthropic + 10 OpenAI + 5 Gemini)
Coverage maintained ≥ 80%

Quality gate

ruff check src tests — clean
mypy --strict src — 0 issues in 21 source files
pytest tests/unit — 304/304 pass
Live integration tests verified manually against real APIs

Known gaps (intentional, documented)

OpenAI Predicted Outputs tokens (accepted_/rejected_prediction_tokens) — not surfaced; would risk double-counting against completion_tokens
Gemini Vertex AI mode — same adapter works (same response shape), not specifically tested
Multimodal image input on OpenAI — Phase 3

Test plan

Review the canonical-schema change (single new field: audio_output)
Verify the detector change doesn't break existing kind == "google" consumers (none today)
Run unit suite locally (pytest tests/unit -q)
Optionally run live integration tests with API keys exported

- src/lago_agent_sdk/adapters/anthropic_native.py — extract_anthropic_native - src/lago_agent_sdk/wrappers/anthropic.py — wraps messages.create (sync + async, streaming and non-streaming) and messages.stream context manager - Wired into sdk.wrap() dispatch and adapters/__init__.py exports - anthropic = ["anthropic>=0.30"] optional-dep group - 19 new unit tests + 3 live integration tests; 256 unit tests pass - Coverage 80.71% — gate maintained - 9 captured response fixtures from real Anthropic API - README + CHANGELOG updated

The test set max_batch_size == max_buffer_size == 100, which caused the push that brings the buffer to 100 to trigger a wake on the background worker. The worker would take a batch (emptying the buffer) and then race with the remaining 150 pushes to call slow_sender. On CI's slower runners the worker sometimes squeezed in additional batches before slow_sender finally blocked, leaving the buffer with fewer items than the expected sliding window. Setting max_batch_size > max_buffer_size guarantees push() never sets the wake event (buffer can never reach max_batch_size). Combined with a long flush_interval the worker only runs once shutdown() releases the pause in the finally block — fully deterministic. Verified with 5 consecutive runs.

Adapter handles both API shapes with auto-detection: Chat Completions (client.chat.completions.create): usage.prompt_tokens -> input usage.completion_tokens -> output usage.prompt_tokens_details.cached_tokens -> cache_read usage.prompt_tokens_details.audio_tokens -> audio_input usage.completion_tokens_details.reasoning_tokens -> reasoning (o-series) usage.completion_tokens_details.audio_tokens -> audio_output count of choices[0].message.tool_calls -> tool_calls Responses API (client.responses.create): usage.input_tokens -> input usage.output_tokens -> output usage.input_tokens_details.cached_tokens -> cache_read usage.output_tokens_details.reasoning_tokens -> reasoning count of output[].type == "function_call" -> tool_calls Wrapper covers both methods, sync + streaming, on both OpenAI and AsyncOpenAI. For Chat Completions streaming, auto-injects stream_options.include_usage=true when missing so the final chunk carries usage data (without that flag, OpenAI emits no usage on streamed responses). CanonicalUsage extended with audio_output (mapped to llm_audio_output_tokens) to capture GPT-4o-audio output usage. OpenAI is the first provider to actually populate llm_reasoning_tokens (o-series surfaces reasoning tokens separately; Anthropic/Bedrock fold them into output_tokens). Predicted Outputs tokens (accepted/rejected_prediction_tokens) are intentionally not surfaced -- documented in the adapter docstring as a v1 gap. 27 new unit tests (18 adapter + 9 wrapper). 5 live integration tests gated on OPENAI_API_KEY. 10 captured response fixtures from the real OpenAI API. Total: 283 unit tests passing, ruff + mypy strict clean.

Adapter maps usage_metadata fields to CanonicalUsage: prompt_token_count -> input candidates_token_count -> output cached_content_token_count -> cache_read thoughts_token_count -> reasoning prompt_tokens_details[modality=AUDIO].token_count -> audio_input prompt_tokens_details[modality=IMAGE].token_count -> image_input candidates_tokens_details[modality=AUDIO].token_count -> audio_output count of candidates[0].content.parts[].function_call -> tool_calls Wrapper covers client.models.generate_content + generate_content_stream (sync) and the async variants under client.aio.models. Idempotent via _lago_instrumented sentinel. Detector now returns 'gemini' (was 'google') for google-genai clients -- matches the naming convention used by other providers (bedrock, anthropic, openai, mistral). Semantic note vs OpenAI: Gemini's `thoughts_token_count` is ADDITIVE to `candidates_token_count` (verified by math across all 5 fixtures: input + output + reasoning = total). OpenAI's `reasoning_tokens` is a SUBSET of `completion_tokens`. Documented in adapter docstring + README for customers configuring per-metric billing. Gemini 2.5 emits reasoning tokens by default (no explicit thinking_config needed) -- second provider populating llm_reasoning_tokens. 21 new unit tests (15 adapter + 6 wrapper). 4 live integration tests gated on GEMINI_API_KEY. 5 captured response fixtures (plain, tool use, streaming, thinking, multi-turn). Total: 304 unit tests passing, ruff + mypy strict clean.

CI runs `ruff format --check` which was failing because earlier dev only ran `ruff check` (linter) locally, not the formatter. Auto-formatting restores whitespace consistency in: - src/lago_agent_sdk/adapters/gemini_native.py - src/lago_agent_sdk/wrappers/openai.py - tests/unit/adapters/fixtures/capture_openai.py - tests/unit/adapters/test_gemini_native.py - tests/unit/test_wrapper_gemini.py No functional changes.

anassg-lago added 5 commits May 20, 2026 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add native Anthropic, OpenAI, and Gemini SDK support#3

Add native Anthropic, OpenAI, and Gemini SDK support#3
anassg-lago wants to merge 5 commits into
mainfrom
feature/anthropic-openai-gemini-native

anassg-lago commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anassg-lago commented May 29, 2026

Summary

Commits in this PR

Provider coverage matrix

Key design decisions

Tests

Quality gate

Known gaps (intentional, documented)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant