getlago · anassg-lago · May 20, 2026 · May 22, 2026 · May 29, 2026 · May 29, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,35 @@ All notable changes to this project will be documented here. Format follows [Kee
 
 ## [Unreleased]
 
+### Added
+- Native `google-genai` SDK support covering `client.models.generate_content` + `generate_content_stream`, sync + async (`client.aio.models`).
+- `extract_gemini_native` adapter maps `usage_metadata`: `prompt_token_count → input`, `candidates_token_count → output`, `cached_content_token_count → cache_read`, `thoughts_token_count → reasoning`, `prompt_tokens_details[modality=AUDIO/IMAGE] → audio_input/image_input`, `candidates_tokens_details[modality=AUDIO] → audio_output`, count of `candidates[0].content.parts[].function_call → tool_calls`.
+- **Gemini 2.5 surfaces reasoning tokens by default** (`thoughts_token_count`) — fires `llm_reasoning_tokens` automatically. Note the semantic difference vs OpenAI: Gemini's reasoning is ADDITIVE to output (`candidates + thoughts = total billable output`); OpenAI's reasoning is a SUBSET of `completion_tokens`. Documented in adapter docstring + README.
+- `gemini` optional dependency group: `pip install 'lago-agent-sdk[gemini]'`.
+- 21 new unit tests (15 adapter + 6 wrapper) and 4 live integration tests (gated on `GEMINI_API_KEY`). Total: 304 unit tests.
+- 5 captured response fixtures from the real Gemini API (plain, tool use, streaming, thinking, multi-turn).
+- Detector now returns `gemini` (was `google`) for `google-genai` clients.
+
+### Added (OpenAI — earlier in this branch)
+- Native `openai` SDK support covering both APIs: `chat.completions.create` and `responses.create`, each with sync + streaming. Same coverage on `AsyncOpenAI`.
+- `extract_openai_native` adapter handles both API shapes with auto-detection:
+  - Chat Completions: `prompt_tokens`, `completion_tokens`, `prompt_tokens_details.{cached_tokens, audio_tokens}`, `completion_tokens_details.{reasoning_tokens, audio_tokens}`, count of `choices[0].message.tool_calls`.
+  - Responses API: `input_tokens`, `output_tokens`, `input_tokens_details.cached_tokens`, `output_tokens_details.reasoning_tokens`, count of `output[].type == "function_call"`.
+- **First provider to populate `llm_reasoning_tokens`** — OpenAI o-series models (`o4-mini`, `o1`, etc.) surface reasoning token counts separately.
+- Auto-injection of `stream_options={"include_usage": True}` when the customer sets `stream=True` without it, so streamed Chat Completions emit usage on the final chunk.
+- `audio_output` field added to `CanonicalUsage` (maps to `llm_audio_output_tokens`), populated by GPT-4o-audio responses.
+- `openai` optional dependency group: `pip install 'lago-agent-sdk[openai]'`.
+- 27 new unit tests (18 adapter + 9 wrapper) and 5 live integration tests (gated on `OPENAI_API_KEY`). Total: 283 unit tests.
+- 10 captured response fixtures from the real OpenAI API (plain chat, tool use, auto-caching, streaming with usage, o-series reasoning, multi-turn, Responses API plain + tool use + reasoning).
+
+### Previously in unreleased (Anthropic)
+- Native `anthropic` SDK support. Wraps `Anthropic.messages.create` (including `stream=True`) and `Anthropic.messages.stream(...)` context manager. Same coverage on `AsyncAnthropic` (sync + async variants).
+- `extract_anthropic_native` adapter with the full Anthropic field map: `input_tokens`, `output_tokens`, `cache_creation_input_tokens`, `cache_read_input_tokens`, `cache_creation.ephemeral_5m_input_tokens`, `cache_creation.ephemeral_1h_input_tokens`, `content[].type == "tool_use"`.
+- `anthropic` optional dependency group: `pip install 'lago-agent-sdk[anthropic]'`.
+- 19 unit tests (adapter + wrapper) and 3 live integration tests (gated on `ANTHROPIC_API_KEY`).
+- 9 captured response fixtures from the real Anthropic API (plain, tool use, 5m + 1h prompt caching, extended thinking, streaming, multi-turn).
+
+
 ## [0.1.0] — initial release
 
 ### Added

diff --git a/README.md b/README.md
@@ -29,6 +29,9 @@ pip install lago-agent-sdk
 
 For Bedrock support: `pip install 'lago-agent-sdk[bedrock]'` (adds `boto3`).
 For Mistral support: `pip install 'lago-agent-sdk[mistral]'` (adds `mistralai`).
+For Anthropic native support: `pip install 'lago-agent-sdk[anthropic]'` (adds `anthropic`).
+For OpenAI native support: `pip install 'lago-agent-sdk[openai]'` (adds `openai`).
+For Gemini native support: `pip install 'lago-agent-sdk[gemini]'` (adds `google-genai`).
 
 ## Quickstart — Bedrock
 
@@ -52,6 +55,25 @@ sdk.flush()
 
 The wrapped client behaves identically to the original — same arguments, same return shape, same exceptions. The SDK adds an in-memory queue that batches events to Lago in the background.
 
+## Quickstart — Anthropic
+
+```python
+from anthropic import Anthropic
+from lago_agent_sdk import LagoSDK
+
+sdk = LagoSDK(api_key="...", default_subscription_id="sub_acme")
+client = sdk.wrap(Anthropic(api_key="..."))
+
+resp = client.messages.create(
+    model="claude-sonnet-4-6",
+    max_tokens=200,
+    messages=[{"role": "user", "content": "Hello"}],
+)
+sdk.flush()
+```
+
+Works with `Anthropic` and `AsyncAnthropic`. Both `messages.create(..., stream=True)` and the `messages.stream(...)` context manager are instrumented — usage is captured from the final `message_delta` event in either case.
+
 ## Quickstart — Mistral
 
 ```python
@@ -68,6 +90,49 @@ resp = client.chat.complete(
 sdk.flush()
 ```
 
+## Quickstart — OpenAI
+
+```python
+from openai import OpenAI
+from lago_agent_sdk import LagoSDK
+
+sdk = LagoSDK(api_key="...", default_subscription_id="sub_acme")
+client = sdk.wrap(OpenAI(api_key="..."))
+
+resp = client.chat.completions.create(
+    model="gpt-4o-mini",
+    messages=[{"role": "user", "content": "Hello"}],
+    max_completion_tokens=200,
+)
+sdk.flush()
+```
+
+Works with `OpenAI` and `AsyncOpenAI`. Covers both **Chat Completions** (`client.chat.completions.create`) and the newer **Responses API** (`client.responses.create`), sync + streaming. For streaming, the wrapper auto-injects `stream_options={"include_usage": True}` so the final chunk carries usage data — without it OpenAI emits no usage on streamed responses.
+
+**Reasoning tokens** (`llm_reasoning_tokens`) populate automatically when you call an o-series model (`o4-mini`, `o1`, etc.) — OpenAI is the first provider to expose this metric separately.
+
+## Quickstart — Gemini
+
+```python
+from google import genai
+from lago_agent_sdk import LagoSDK
+
+sdk = LagoSDK(api_key="...", default_subscription_id="sub_acme")
+client = sdk.wrap(genai.Client(api_key="..."))
+
+resp = client.models.generate_content(
+    model="gemini-2.5-flash",
+    contents="Hello",
+)
+sdk.flush()
+```
+
+Wraps the modern `google-genai` SDK (`from google import genai`). Covers `client.models.generate_content` + `generate_content_stream`, sync + async (via `client.aio.models`).
+
+**Reasoning tokens** populate automatically on Gemini 2.5 — the model reasons internally by default and surfaces `thoughts_token_count`. Note the semantic difference vs OpenAI:
+- **OpenAI:** `reasoning_tokens` is a *subset* of `completion_tokens` (already counted in output)
+- **Gemini:** `thoughts_token_count` is *additive* to `candidates_token_count` (total Google bill = output + reasoning)
+
 ## Multi-tenant — pick a subscription per call
 
 Three ways to set the `external_subscription_id`, in priority order:
@@ -92,28 +157,34 @@ Backed by `contextvars` for safe propagation across `asyncio` tasks.
 |---|---|---|
 | AWS Bedrock | `Converse` (sync + stream) | ✓ |
 | AWS Bedrock | `InvokeModel` (sync + stream), 7 model families | ✓ |
+| Anthropic | native SDK (`messages.create` + `messages.stream`, sync + async) | ✓ |
 | Mistral | native SDK (`chat.complete` + `chat.stream`) | ✓ |
-| OpenAI | native SDK | Phase 2 |
-| Anthropic | native SDK | Phase 2 |
-| Google Gemini | native SDK | Phase 2 |
+| OpenAI | native SDK (`chat.completions.create` + `responses.create`, sync + async + stream) | ✓ |
+| Google Gemini | native SDK (`google-genai`: `models.generate_content` + `generate_content_stream`, sync + async) | ✓ |
 | LiteLLM | callback bridge | Phase 4 |
 
 ## Token dimensions captured
 
-`CanonicalUsage` carries 10 numeric fields. Which ones populate depends on the provider:
-
-| Field | Lago metric code | Bedrock | Mistral native |
-|---|---|---|---|
-| input | `llm_input_tokens` | ✓ | ✓ |
-| output | `llm_output_tokens` | ✓ | ✓ |
-| cache_read | `llm_cached_input_tokens` | ✓ (Anthropic) | ✓ (when cache hits) |
-| cache_write | `llm_cache_creation_tokens` | ✓ (Anthropic) | ✗ |
-| cache_write_5m / 1h | `llm_cache_write_5m/1h_tokens` | ✓ (Anthropic InvokeModel) | ✗ |
-| reasoning | `llm_reasoning_tokens` | ✗ (folded into output) | ✗ (folded into output) |
-| tool_calls | `llm_tool_calls` | ✓ | ✓ |
-| image_input / audio_input | `llm_image/audio_input_tokens` | ✗ | ✗ |
-
-Reasoning, image, and audio fields will populate when Phase 2 native OpenAI ships.
+`CanonicalUsage` carries 11 numeric fields. Which ones populate depends on the provider:
+
+| Field | Lago metric code | Bedrock | Anthropic | Mistral | OpenAI | Gemini |
+|---|---|---|---|---|---|---|
+| input | `llm_input_tokens` | ✓ | ✓ | ✓ | ✓ | ✓ |
+| output | `llm_output_tokens` | ✓ | ✓ | ✓ | ✓ | ✓ |
+| cache_read | `llm_cached_input_tokens` | ✓ (Anthropic) | ✓ | ✓ (when cache hits) | ✓ (auto-cache) | ✓ (CachedContent API) |
+| cache_write | `llm_cache_creation_tokens` | ✓ (Anthropic) | ✓ | ✗ | ✗ | ✗ |
+| cache_write_5m / 1h | `llm_cache_write_5m/1h_tokens` | ✓ (Anthropic InvokeModel) | ✓ | ✗ | ✗ | ✗ |
+| reasoning | `llm_reasoning_tokens` | ✗ (folded into output) | ✗ (folded into output, even with extended thinking) | ✗ (folded into output) | **✓ (o-series, subset)** | **✓ (Gemini 2.5, additive)** |
+| tool_calls | `llm_tool_calls` | ✓ | ✓ | ✓ | ✓ | ✓ |
+| audio_input | `llm_audio_input_tokens` | ✗ | ✗ | ✗ | ✓ (GPT-4o-audio) | ✓ (multimodal AUDIO) |
+| audio_output | `llm_audio_output_tokens` | ✗ | ✗ | ✗ | ✓ (GPT-4o-audio) | ✓ (multimodal AUDIO) |
+| image_input | `llm_image_input_tokens` | ✗ | ✗ | ✗ | ✗ (Phase 3) | ✓ (multimodal IMAGE) |
+
+**Semantic note on `reasoning`:**
+- **OpenAI's `reasoning_tokens` is a SUBSET of `output`** — already counted in `completion_tokens`.
+- **Gemini's `thoughts_token_count` is ADDITIVE to `output`** — `candidates + thoughts = total billable output`.
+
+OpenAI's Predicted Outputs tokens (`accepted_prediction_tokens`, `rejected_prediction_tokens`) are not surfaced — see the OpenAI adapter docstring for details on this intentional gap.
 
 ## Error policy
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -37,6 +37,15 @@ dev = [
     "mypy>=1.10",
     "types-requests>=2.31",
 ]
+anthropic = [
+    "anthropic>=0.30",
+]
+openai = [
+    "openai>=1.50",
+]
+gemini = [
+    "google-genai>=1.0",
+]
 
 [project.urls]
 Homepage = "https://www.getlago.com"
@@ -79,5 +88,12 @@ strict = true
 files = ["src/lago_agent_sdk"]
 
 [[tool.mypy.overrides]]
-module = ["boto3.*", "botocore.*", "mistralai.*"]
+module = ["boto3.*", "botocore.*", "mistralai.*", "openai.*", "google.*"]
 ignore_missing_imports = true
+
+[dependency-groups]
+dev = [
+    "anthropic>=0.30",
+    "openai>=1.50",
+    "google-genai>=1.0",
+]
diff --git a/src/lago_agent_sdk/adapters/__init__.py b/src/lago_agent_sdk/adapters/__init__.py
@@ -1,10 +1,16 @@
+from .anthropic_native import extract_anthropic_native
 from .bedrock_converse import extract_bedrock_converse
 from .bedrock_invoke import extract_bedrock_invoke, pick_invoke_adapter
+from .gemini_native import extract_gemini_native
 from .mistral_native import extract_mistral_native
+from .openai_native import extract_openai_native
 
 __all__ = [
+    "extract_anthropic_native",
     "extract_bedrock_converse",
     "extract_bedrock_invoke",
     "pick_invoke_adapter",
+    "extract_gemini_native",
     "extract_mistral_native",
+    "extract_openai_native",
 ]
diff --git a/src/lago_agent_sdk/adapters/anthropic_native.py b/src/lago_agent_sdk/adapters/anthropic_native.py
@@ -0,0 +1,91 @@
+"""Anthropic native adapter — verified against real fixtures.
+
+Field mapping:
+  usage.input_tokens                                 → input
+  usage.output_tokens                                → output
+  usage.cache_read_input_tokens                      → cache_read
+  usage.cache_creation_input_tokens                  → cache_write
+  usage.cache_creation.ephemeral_5m_input_tokens     → cache_write_5m
+  usage.cache_creation.ephemeral_1h_input_tokens     → cache_write_1h
+  count of content[].type == "tool_use"              → tool_calls
+
+Not exposed by Anthropic (folded into output_tokens):
+  reasoning_tokens — even with extended thinking enabled
+
+Unknown usage fields (service_tier, inference_geo, server_tool_use, …) land in extras.
+"""
+
+from __future__ import annotations
+
+from typing import Any, cast
+
+from ..canonical import CanonicalUsage
+
+_KNOWN_USAGE_FIELDS = {
+    "input_tokens",
+    "output_tokens",
+    "cache_read_input_tokens",
+    "cache_creation_input_tokens",
+    "cache_creation",
+}
+
+
+def _safe_dict(v: Any) -> dict[str, Any]:
+    return v if isinstance(v, dict) else {}
+
+
+def _safe_int(v: Any) -> int:
+    try:
+        return max(0, int(v or 0))
+    except (TypeError, ValueError):
+        return 0
+
+
+def _to_dict(obj: Any) -> dict[str, Any]:
+    """Best-effort pydantic-or-dict to dict (Anthropic SDK returns pydantic Message objects)."""
+    if isinstance(obj, dict):
+        return obj
+    if hasattr(obj, "model_dump"):
+        try:
+            return cast(dict[str, Any], obj.model_dump())
+        except Exception:  # noqa: BLE001
+            pass
+    return {}
+
+
+def extract_anthropic_native(response: Any, model_id: str = "") -> CanonicalUsage:
+    """Translate an Anthropic native response (Message or dict) → CanonicalUsage.
+
+    Accepts the SDK's pydantic Message object, a dict (e.g. captured fixture),
+    or a synthetic `{"usage": {...}}` blob produced by the streaming wrapper.
+    """
+    resp = _to_dict(response) if not isinstance(response, dict) else response
+
+    usage = _safe_dict(resp.get("usage"))
+    cache_creation = _safe_dict(usage.get("cache_creation"))
+
+    content = resp.get("content")
+    tool_calls = (
+        sum(1 for b in content if isinstance(b, dict) and b.get("type") == "tool_use")
+        if isinstance(content, list)
+        else 0
+    )
+
+    extras: dict[str, Any] = {}
+    for k, v in usage.items():
+        if k not in _KNOWN_USAGE_FIELDS:
+            extras[k] = v
+
+    return CanonicalUsage(
+        input=_safe_int(usage.get("input_tokens")),
+        output=_safe_int(usage.get("output_tokens")),
+        cache_read=_safe_int(usage.get("cache_read_input_tokens")),
+        cache_write=_safe_int(usage.get("cache_creation_input_tokens")),
+        cache_write_5m=_safe_int(cache_creation.get("ephemeral_5m_input_tokens")),
+        cache_write_1h=_safe_int(cache_creation.get("ephemeral_1h_input_tokens")),
+        tool_calls=tool_calls,
+        model=model_id or (resp.get("model") if isinstance(resp.get("model"), str) else "") or "",
+        provider="anthropic",
+        api="native",
+        extras=extras,
+    )