Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,35 @@ All notable changes to this project will be documented here. Format follows [Kee

## [Unreleased]

### Added
- Native `google-genai` SDK support covering `client.models.generate_content` + `generate_content_stream`, sync + async (`client.aio.models`).
- `extract_gemini_native` adapter maps `usage_metadata`: `prompt_token_count → input`, `candidates_token_count → output`, `cached_content_token_count → cache_read`, `thoughts_token_count → reasoning`, `prompt_tokens_details[modality=AUDIO/IMAGE] → audio_input/image_input`, `candidates_tokens_details[modality=AUDIO] → audio_output`, count of `candidates[0].content.parts[].function_call → tool_calls`.
- **Gemini 2.5 surfaces reasoning tokens by default** (`thoughts_token_count`) — fires `llm_reasoning_tokens` automatically. Note the semantic difference vs OpenAI: Gemini's reasoning is ADDITIVE to output (`candidates + thoughts = total billable output`); OpenAI's reasoning is a SUBSET of `completion_tokens`. Documented in adapter docstring + README.
- `gemini` optional dependency group: `pip install 'lago-agent-sdk[gemini]'`.
- 21 new unit tests (15 adapter + 6 wrapper) and 4 live integration tests (gated on `GEMINI_API_KEY`). Total: 304 unit tests.
- 5 captured response fixtures from the real Gemini API (plain, tool use, streaming, thinking, multi-turn).
- Detector now returns `gemini` (was `google`) for `google-genai` clients.

### Added (OpenAI — earlier in this branch)
- Native `openai` SDK support covering both APIs: `chat.completions.create` and `responses.create`, each with sync + streaming. Same coverage on `AsyncOpenAI`.
- `extract_openai_native` adapter handles both API shapes with auto-detection:
- Chat Completions: `prompt_tokens`, `completion_tokens`, `prompt_tokens_details.{cached_tokens, audio_tokens}`, `completion_tokens_details.{reasoning_tokens, audio_tokens}`, count of `choices[0].message.tool_calls`.
- Responses API: `input_tokens`, `output_tokens`, `input_tokens_details.cached_tokens`, `output_tokens_details.reasoning_tokens`, count of `output[].type == "function_call"`.
- **First provider to populate `llm_reasoning_tokens`** — OpenAI o-series models (`o4-mini`, `o1`, etc.) surface reasoning token counts separately.
- Auto-injection of `stream_options={"include_usage": True}` when the customer sets `stream=True` without it, so streamed Chat Completions emit usage on the final chunk.
- `audio_output` field added to `CanonicalUsage` (maps to `llm_audio_output_tokens`), populated by GPT-4o-audio responses.
- `openai` optional dependency group: `pip install 'lago-agent-sdk[openai]'`.
- 27 new unit tests (18 adapter + 9 wrapper) and 5 live integration tests (gated on `OPENAI_API_KEY`). Total: 283 unit tests.
- 10 captured response fixtures from the real OpenAI API (plain chat, tool use, auto-caching, streaming with usage, o-series reasoning, multi-turn, Responses API plain + tool use + reasoning).

### Previously in unreleased (Anthropic)
- Native `anthropic` SDK support. Wraps `Anthropic.messages.create` (including `stream=True`) and `Anthropic.messages.stream(...)` context manager. Same coverage on `AsyncAnthropic` (sync + async variants).
- `extract_anthropic_native` adapter with the full Anthropic field map: `input_tokens`, `output_tokens`, `cache_creation_input_tokens`, `cache_read_input_tokens`, `cache_creation.ephemeral_5m_input_tokens`, `cache_creation.ephemeral_1h_input_tokens`, `content[].type == "tool_use"`.
- `anthropic` optional dependency group: `pip install 'lago-agent-sdk[anthropic]'`.
- 19 unit tests (adapter + wrapper) and 3 live integration tests (gated on `ANTHROPIC_API_KEY`).
- 9 captured response fixtures from the real Anthropic API (plain, tool use, 5m + 1h prompt caching, extended thinking, streaming, multi-turn).


## [0.1.0] — initial release

### Added
Expand Down
105 changes: 88 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ pip install lago-agent-sdk

For Bedrock support: `pip install 'lago-agent-sdk[bedrock]'` (adds `boto3`).
For Mistral support: `pip install 'lago-agent-sdk[mistral]'` (adds `mistralai`).
For Anthropic native support: `pip install 'lago-agent-sdk[anthropic]'` (adds `anthropic`).
For OpenAI native support: `pip install 'lago-agent-sdk[openai]'` (adds `openai`).
For Gemini native support: `pip install 'lago-agent-sdk[gemini]'` (adds `google-genai`).

## Quickstart — Bedrock

Expand All @@ -52,6 +55,25 @@ sdk.flush()

The wrapped client behaves identically to the original — same arguments, same return shape, same exceptions. The SDK adds an in-memory queue that batches events to Lago in the background.

## Quickstart — Anthropic

```python
from anthropic import Anthropic
from lago_agent_sdk import LagoSDK

sdk = LagoSDK(api_key="...", default_subscription_id="sub_acme")
client = sdk.wrap(Anthropic(api_key="..."))

resp = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=200,
messages=[{"role": "user", "content": "Hello"}],
)
sdk.flush()
```

Works with `Anthropic` and `AsyncAnthropic`. Both `messages.create(..., stream=True)` and the `messages.stream(...)` context manager are instrumented — usage is captured from the final `message_delta` event in either case.

## Quickstart — Mistral

```python
Expand All @@ -68,6 +90,49 @@ resp = client.chat.complete(
sdk.flush()
```

## Quickstart — OpenAI

```python
from openai import OpenAI
from lago_agent_sdk import LagoSDK

sdk = LagoSDK(api_key="...", default_subscription_id="sub_acme")
client = sdk.wrap(OpenAI(api_key="..."))

resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}],
max_completion_tokens=200,
)
sdk.flush()
```

Works with `OpenAI` and `AsyncOpenAI`. Covers both **Chat Completions** (`client.chat.completions.create`) and the newer **Responses API** (`client.responses.create`), sync + streaming. For streaming, the wrapper auto-injects `stream_options={"include_usage": True}` so the final chunk carries usage data — without it OpenAI emits no usage on streamed responses.

**Reasoning tokens** (`llm_reasoning_tokens`) populate automatically when you call an o-series model (`o4-mini`, `o1`, etc.) — OpenAI is the first provider to expose this metric separately.

## Quickstart — Gemini

```python
from google import genai
from lago_agent_sdk import LagoSDK

sdk = LagoSDK(api_key="...", default_subscription_id="sub_acme")
client = sdk.wrap(genai.Client(api_key="..."))

resp = client.models.generate_content(
model="gemini-2.5-flash",
contents="Hello",
)
sdk.flush()
```

Wraps the modern `google-genai` SDK (`from google import genai`). Covers `client.models.generate_content` + `generate_content_stream`, sync + async (via `client.aio.models`).

**Reasoning tokens** populate automatically on Gemini 2.5 — the model reasons internally by default and surfaces `thoughts_token_count`. Note the semantic difference vs OpenAI:
- **OpenAI:** `reasoning_tokens` is a *subset* of `completion_tokens` (already counted in output)
- **Gemini:** `thoughts_token_count` is *additive* to `candidates_token_count` (total Google bill = output + reasoning)

## Multi-tenant — pick a subscription per call

Three ways to set the `external_subscription_id`, in priority order:
Expand All @@ -92,28 +157,34 @@ Backed by `contextvars` for safe propagation across `asyncio` tasks.
|---|---|---|
| AWS Bedrock | `Converse` (sync + stream) | ✓ |
| AWS Bedrock | `InvokeModel` (sync + stream), 7 model families | ✓ |
| Anthropic | native SDK (`messages.create` + `messages.stream`, sync + async) | ✓ |
| Mistral | native SDK (`chat.complete` + `chat.stream`) | ✓ |
| OpenAI | native SDK | Phase 2 |
| Anthropic | native SDK | Phase 2 |
| Google Gemini | native SDK | Phase 2 |
| OpenAI | native SDK (`chat.completions.create` + `responses.create`, sync + async + stream) | ✓ |
| Google Gemini | native SDK (`google-genai`: `models.generate_content` + `generate_content_stream`, sync + async) | ✓ |
| LiteLLM | callback bridge | Phase 4 |

## Token dimensions captured

`CanonicalUsage` carries 10 numeric fields. Which ones populate depends on the provider:

| Field | Lago metric code | Bedrock | Mistral native |
|---|---|---|---|
| input | `llm_input_tokens` | ✓ | ✓ |
| output | `llm_output_tokens` | ✓ | ✓ |
| cache_read | `llm_cached_input_tokens` | ✓ (Anthropic) | ✓ (when cache hits) |
| cache_write | `llm_cache_creation_tokens` | ✓ (Anthropic) | ✗ |
| cache_write_5m / 1h | `llm_cache_write_5m/1h_tokens` | ✓ (Anthropic InvokeModel) | ✗ |
| reasoning | `llm_reasoning_tokens` | ✗ (folded into output) | ✗ (folded into output) |
| tool_calls | `llm_tool_calls` | ✓ | ✓ |
| image_input / audio_input | `llm_image/audio_input_tokens` | ✗ | ✗ |

Reasoning, image, and audio fields will populate when Phase 2 native OpenAI ships.
`CanonicalUsage` carries 11 numeric fields. Which ones populate depends on the provider:

| Field | Lago metric code | Bedrock | Anthropic | Mistral | OpenAI | Gemini |
|---|---|---|---|---|---|---|
| input | `llm_input_tokens` | ✓ | ✓ | ✓ | ✓ | ✓ |
| output | `llm_output_tokens` | ✓ | ✓ | ✓ | ✓ | ✓ |
| cache_read | `llm_cached_input_tokens` | ✓ (Anthropic) | ✓ | ✓ (when cache hits) | ✓ (auto-cache) | ✓ (CachedContent API) |
| cache_write | `llm_cache_creation_tokens` | ✓ (Anthropic) | ✓ | ✗ | ✗ | ✗ |
| cache_write_5m / 1h | `llm_cache_write_5m/1h_tokens` | ✓ (Anthropic InvokeModel) | ✓ | ✗ | ✗ | ✗ |
| reasoning | `llm_reasoning_tokens` | ✗ (folded into output) | ✗ (folded into output, even with extended thinking) | ✗ (folded into output) | **✓ (o-series, subset)** | **✓ (Gemini 2.5, additive)** |
| tool_calls | `llm_tool_calls` | ✓ | ✓ | ✓ | ✓ | ✓ |
| audio_input | `llm_audio_input_tokens` | ✗ | ✗ | ✗ | ✓ (GPT-4o-audio) | ✓ (multimodal AUDIO) |
| audio_output | `llm_audio_output_tokens` | ✗ | ✗ | ✗ | ✓ (GPT-4o-audio) | ✓ (multimodal AUDIO) |
| image_input | `llm_image_input_tokens` | ✗ | ✗ | ✗ | ✗ (Phase 3) | ✓ (multimodal IMAGE) |

**Semantic note on `reasoning`:**
- **OpenAI's `reasoning_tokens` is a SUBSET of `output`** — already counted in `completion_tokens`.
- **Gemini's `thoughts_token_count` is ADDITIVE to `output`** — `candidates + thoughts = total billable output`.

OpenAI's Predicted Outputs tokens (`accepted_prediction_tokens`, `rejected_prediction_tokens`) are not surfaced — see the OpenAI adapter docstring for details on this intentional gap.

## Error policy

Expand Down
18 changes: 17 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,15 @@ dev = [
"mypy>=1.10",
"types-requests>=2.31",
]
anthropic = [
"anthropic>=0.30",
]
openai = [
"openai>=1.50",
]
gemini = [
"google-genai>=1.0",
]

[project.urls]
Homepage = "https://www.getlago.com"
Expand Down Expand Up @@ -79,5 +88,12 @@ strict = true
files = ["src/lago_agent_sdk"]

[[tool.mypy.overrides]]
module = ["boto3.*", "botocore.*", "mistralai.*"]
module = ["boto3.*", "botocore.*", "mistralai.*", "openai.*", "google.*"]
ignore_missing_imports = true

[dependency-groups]
dev = [
"anthropic>=0.30",
"openai>=1.50",
"google-genai>=1.0",
]
6 changes: 6 additions & 0 deletions src/lago_agent_sdk/adapters/__init__.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
from .anthropic_native import extract_anthropic_native
from .bedrock_converse import extract_bedrock_converse
from .bedrock_invoke import extract_bedrock_invoke, pick_invoke_adapter
from .gemini_native import extract_gemini_native
from .mistral_native import extract_mistral_native
from .openai_native import extract_openai_native

__all__ = [
"extract_anthropic_native",
"extract_bedrock_converse",
"extract_bedrock_invoke",
"pick_invoke_adapter",
"extract_gemini_native",
"extract_mistral_native",
"extract_openai_native",
]
91 changes: 91 additions & 0 deletions src/lago_agent_sdk/adapters/anthropic_native.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
"""Anthropic native adapter — verified against real fixtures.

Field mapping:
usage.input_tokens → input
usage.output_tokens → output
usage.cache_read_input_tokens → cache_read
usage.cache_creation_input_tokens → cache_write
usage.cache_creation.ephemeral_5m_input_tokens → cache_write_5m
usage.cache_creation.ephemeral_1h_input_tokens → cache_write_1h
count of content[].type == "tool_use" → tool_calls

Not exposed by Anthropic (folded into output_tokens):
reasoning_tokens — even with extended thinking enabled

Unknown usage fields (service_tier, inference_geo, server_tool_use, …) land in extras.
"""

from __future__ import annotations

from typing import Any, cast

from ..canonical import CanonicalUsage

_KNOWN_USAGE_FIELDS = {
"input_tokens",
"output_tokens",
"cache_read_input_tokens",
"cache_creation_input_tokens",
"cache_creation",
}


def _safe_dict(v: Any) -> dict[str, Any]:
return v if isinstance(v, dict) else {}


def _safe_int(v: Any) -> int:
try:
return max(0, int(v or 0))
except (TypeError, ValueError):
return 0


def _to_dict(obj: Any) -> dict[str, Any]:
"""Best-effort pydantic-or-dict to dict (Anthropic SDK returns pydantic Message objects)."""
if isinstance(obj, dict):
return obj
if hasattr(obj, "model_dump"):
try:
return cast(dict[str, Any], obj.model_dump())
except Exception: # noqa: BLE001
pass
return {}


def extract_anthropic_native(response: Any, model_id: str = "") -> CanonicalUsage:
"""Translate an Anthropic native response (Message or dict) → CanonicalUsage.

Accepts the SDK's pydantic Message object, a dict (e.g. captured fixture),
or a synthetic `{"usage": {...}}` blob produced by the streaming wrapper.
"""
resp = _to_dict(response) if not isinstance(response, dict) else response

usage = _safe_dict(resp.get("usage"))
cache_creation = _safe_dict(usage.get("cache_creation"))

content = resp.get("content")
tool_calls = (
sum(1 for b in content if isinstance(b, dict) and b.get("type") == "tool_use")
if isinstance(content, list)
else 0
)

extras: dict[str, Any] = {}
for k, v in usage.items():
if k not in _KNOWN_USAGE_FIELDS:
extras[k] = v

return CanonicalUsage(
input=_safe_int(usage.get("input_tokens")),
output=_safe_int(usage.get("output_tokens")),
cache_read=_safe_int(usage.get("cache_read_input_tokens")),
cache_write=_safe_int(usage.get("cache_creation_input_tokens")),
cache_write_5m=_safe_int(cache_creation.get("ephemeral_5m_input_tokens")),
cache_write_1h=_safe_int(cache_creation.get("ephemeral_1h_input_tokens")),
tool_calls=tool_calls,
model=model_id or (resp.get("model") if isinstance(resp.get("model"), str) else "") or "",
provider="anthropic",
api="native",
extras=extras,
)
Loading
Loading