Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17,991 changes: 17,989 additions & 2 deletions .fern/replay.lock

Large diffs are not rendered by default.

27 changes: 27 additions & 0 deletions PYTHON-AGENTKIT-SNAKE-CASE-AUDIT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Python AgentKit Snake Case API Audit

Scope: `agora-agents-python` public AgentKit wrappers, docs, and tests.

Search terms:

```bash
rg -n "apiKey|baseUrl|modelId|voiceId|groupId|keyTerm|turnDetection|inputAudioTranscription|greetingMessage|failureMessage|projectId|adcCredentialsString|sampleRate|targetLanguageCode|resourceName|deploymentName" agora-agents-python
```

## Result

No shipped camelCase public Python constructor kwargs were found in source or docs examples. No deprecated alias helper is required for this pass.

| File | Class / symbol | Public arg or example | Current spelling | Desired Python spelling | `to_config()` key | Wire key | Action | Compatibility needed | Test coverage |
|---|---|---|---|---|---|---|---|---|---|
| `src/agora_agent/agentkit/vendors/tts.py` | `GoogleTTS` | constructor arg | `voice_name` | `voice_name` | `params.VoiceSelectionParams` | `params.VoiceSelectionParams` | keep | no | `tests/custom/test_tts_vendors.py` |
| `src/agora_agent/agentkit/vendors/tts.py` | `RimeTTS` | constructor arg | `model_id` | `model_id` | `params.modelId` | `params.modelId` | keep | no | `tests/custom/test_tts_vendors.py` |
| `src/agora_agent/agentkit/vendors/tts.py` | `MurfTTS` | constructor arg | `voice_id` | `voice_id` | `params.voiceId` | `params.voiceId` | keep | no | `tests/custom/test_tts_vendors.py`, `tests/custom/test_request_body.py` |
| `src/agora_agent/types/rime_tts_params.py` | generated model | generated alias | `modelId` | n/a | `model_id` | `modelId` | keep | no | `tests/custom/test_tts_vendors.py` |
| `src/agora_agent/types/murf_tts_params.py` | generated model | generated alias | `voiceId` | n/a | `voice_id` | `voiceId` | keep | no | `tests/custom/test_tts_vendors.py` |
| `tests/custom/test_request_body.py` | wire assertion | payload key | `voiceId` | n/a | `params.voiceId` | `params.voiceId` | keep | no | request-body test |
| `tests/custom/test_tts_vendors.py` | wire assertion | payload key | `modelId`, `voiceId`, `VoiceSelectionParams` | n/a | generated model fields | wire aliases | keep | no | wire serialization test |

## Guardrail Added

`tests/custom/test_docs_snake_case.py` scans Python markdown code fences and fails on common camelCase kwargs such as `apiKey`, `baseUrl`, `modelId`, `voiceId`, `projectId`, and `greetingMessage`. JSON, TypeScript, Go, shell, and YAML examples are skipped so wire payload examples can retain required non-Python keys.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ pip install agora-agents
## Quick Start

Start with the `Agent` builder: create a client with app credentials, choose your ASR, LLM, and TTS providers, then start a session. Omit vendor API keys for supported Agora-managed models, or provide keys when you want BYOK.
Set Agora interaction language with `turn_detection.language`; provider-specific STT language values remain under `asr.params`.
Set Agora interaction language with `turn_detection.language`; provider-specific STT language values remain under `asr.params`. Ares uses only the REST `asr.language` value sourced from `turn_detection.language`.

```python
import os
Expand Down
24 changes: 22 additions & 2 deletions changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,26 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/).

## [v2.2.0] — 2026-06-05

### Added

- **Expanded provider surface** — Added generated API support for the latest Conversational AI vendors and configuration types, including Dify LLM and Generic Avatar.
- **Interaction language handling** — AgentKit now consistently derives REST `asr.language` from `turn_detection.language` while keeping provider-specific STT language values under `asr.params`.
- **Deepgram keyterm** — Added `keyterm` support on `DeepgramSTT`, serialized as `asr.params.keyterm`.

### Changed

- **MiniMax managed presets** — MiniMax preset-backed TTS now keeps the preset model as an internal hint while sending only supported partial TTS settings such as `voice_setting.voice_id`.
- **Vertex AI LLM routing** — `VertexAILLM` now keeps project and location in the generated endpoint URL instead of duplicating them in `llm.params`.

### Fixed

- **Provider wire keys** — Corrected alias-sensitive TTS payloads so Google TTS emits `VoiceSelectionParams` and `AudioConfig`, Rime TTS emits `modelId`, and Murf TTS preserves `voiceId`.
- **AgentKit request validation** — Start request validation now de-aliases REST-shaped provider dictionaries before constructing generated request models, while still allowing preset and pipeline-backed partial configs.
- **Request body coverage** — Added regression tests for BYOK, preset-backed, mixed preset/BYOK, and pipeline override request shapes across provider configurations.
- **Python docs examples** — Added a docs guard to keep Python examples on snake_case kwargs while allowing documented JSON wire keys.

## [v2.1.0] — 2026-06-02

### Added
Expand All @@ -21,7 +41,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/).
### Fixed

- **Managed-provider validation** — AgentKit validation now distinguishes preset-backed providers from BYOK providers so required provider fields are only required when credentials are caller-supplied.
- **Language placement** — Provider-specific STT language values remain under `asr.params`, while Agora interaction language is emitted separately as `turn_detection.language`.
- **Language placement** — Provider-specific STT language values remain under `asr.params`; the REST `asr.language` field is populated from `turn_detection.language`.

## [v2.0.0] — 2026-05-21

Expand Down Expand Up @@ -114,7 +134,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/).

### Fixed

- **`AresSTT`** — Removed redundant `language` key from the `params` dict. Language is now emitted only at the top level. `params` is only included when `additional_params` is provided.
- **`AresSTT`** — Removed redundant `language` key from the `params` dict. Ares only selects the provider; AgentKit populates REST `asr.language` from `turn_detection.language`. `params` is only included when `additional_params` is provided.
- **`OpenAIRealtime` / `VertexAI` (MLLM)** — Agent-level `greeting` and `failure_message` defaults are now correctly applied when missing in MLLM mode. Previously these values were silently dropped.
- **`VertexAI` (MLLM)** — `messages` is emitted at the MLLM top level, matching the generated core SDK contract.

Expand Down
4 changes: 2 additions & 2 deletions compat/agora-agent-server-sdk/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name = "agora-agent-server-sdk"

[tool.poetry]
name = "agora-agent-server-sdk"
version = "v2.1.1"
version = "v2.2.0"
description = "Compatibility shim for the renamed agora-agents package."
readme = "README.md"
authors = []
Expand Down Expand Up @@ -35,7 +35,7 @@ Repository = 'https://github.com/AgoraIO-Conversational-AI/agent-server-sdk-pyth

[tool.poetry.dependencies]
python = "^3.8"
agora-agents = ">=2.1.1,<3.0.0"
agora-agents = ">=2.2.0,<3.0.0"

[build-system]
requires = ["poetry-core"]
Expand Down
4 changes: 2 additions & 2 deletions docs/concepts/vendors.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,12 +75,12 @@ tts = ElevenLabsTTS(

Used with `agent.with_stt()`.

Use `turn_detection.language` for Agora interaction language; it defaults to `en-US`. STT vendor `language` options are serialized under `asr.params` using each provider's own format.
Use `turn_detection.language` for Agora interaction language; it defaults to `en-US`. STT vendor `language` options are serialized under `asr.params` using each provider's own format. Ares does not take a provider language option; AgentKit uses `turn_detection.language` for REST `asr.language`.

| Class | Provider | Required Parameters |
|---|---|---|
| `SpeechmaticsSTT` | Speechmatics | `api_key`, `language` |
| `DeepgramSTT` | Deepgram | `model` for Agora-managed `nova-2`/`nova-3`; `api_key` for BYOK |
| `DeepgramSTT` | Deepgram | `model` for Agora-managed `nova-2`/`nova-3`; `api_key` for BYOK; `language?`, `keyterm?` |
| `MicrosoftSTT` | Microsoft Azure | `key`, `region`, `language` |
| `OpenAISTT` | OpenAI | `api_key` |
| `GoogleSTT` | Google Cloud | `project_id`, `location`, `adc_credentials_string`, `language` |
Expand Down
4 changes: 2 additions & 2 deletions docs/reference/vendors.md
Original file line number Diff line number Diff line change
Expand Up @@ -318,7 +318,7 @@ The SDK also includes named helpers for the remaining Agora-supported LLM provid

## STT Vendors

Use `turn_detection.language` for Agora interaction language; it defaults to `en-US`. Provider-specific language values remain under `asr.params` and may use a different format.
Use `turn_detection.language` for Agora interaction language; it defaults to `en-US`. Provider-specific language values remain under `asr.params` and may use a different format. AgentKit populates REST `asr.language` from `turn_detection.language`.

### `SpeechmaticsSTT`

Expand All @@ -336,6 +336,7 @@ Use `turn_detection.language` for Agora interaction language; it defaults to `en
| `api_key` | `str` | BYOK only | `None` | Deepgram API key. Optional only for Agora-managed `nova-2` and `nova-3`. |
| `model` | `str` | No | `None` | Model (e.g., `nova-2`) |
| `language` | `str` | No | `None` | Language code (e.g., `en-US`) |
| `keyterm` | `str` | No | `None` | Boost specialized terms and brands; serialized as `asr.params.keyterm` |
| `smart_format` | `bool` | No | `None` | Enable smart formatting |
| `punctuation` | `bool` | No | `None` | Enable punctuation |
| `additional_params` | `Dict[str, Any]` | No | `None` | Additional parameters |
Expand Down Expand Up @@ -396,7 +397,6 @@ For `nova-2` and `nova-3`, omit `api_key` to use Agora-managed credentials. For

| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| `language` | `str` | No | `None` | Language code |
| `additional_params` | `Dict[str, Any]` | No | `None` | Additional parameters |

### `SarvamSTT`
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name = "agora-agents"

[tool.poetry]
name = "agora-agents"
version = "v2.1.1"
version = "v2.2.0"
description = ""
readme = "README.md"
authors = []
Expand Down
40 changes: 22 additions & 18 deletions src/agora_agent/agentkit/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@
from ..agent_management.types.agent_think_agent_management_response import (
AgentThinkAgentManagementResponse,
)
from ..core.pydantic_utilities import parse_obj_as
from .vendors.base import BaseAvatar, BaseLLM, BaseMLLM, BaseSTT, BaseTTS

# Top-level aliases
Expand Down Expand Up @@ -188,6 +189,13 @@ class SessionOptions(typing_extensions.TypedDict, total=False):
debug: bool
warn: typing.Callable[[str], None]


def _start_properties_from_mapping(
properties: typing.Mapping[str, typing.Any],
) -> StartAgentsRequestProperties:
return parse_obj_as(StartAgentsRequestProperties, dict(properties))


# LLM sub-type aliases
LlmGreetingConfigs = typing.Dict[str, typing.Any]
LlmGreetingConfigsMode = typing.Any
Expand Down Expand Up @@ -298,7 +306,7 @@ def _is_turn_detection_language(value: typing.Any) -> bool:

def _validate_turn_detection_language(value: typing.Any) -> TurnDetectionLanguage:
if not _is_turn_detection_language(value):
raise ValueError(f"Invalid interaction language: {value}")
raise ValueError(f"Invalid turn_detection.language: {value}")
return value # type: ignore[return-value]


Expand Down Expand Up @@ -896,7 +904,7 @@ def to_properties(
if self._failure_message is not None:
mllm_config.setdefault("failure_message", self._failure_message)
base_kwargs["mllm"] = mllm_config
return StartAgentsRequestProperties(**base_kwargs)
return _start_properties_from_mapping(base_kwargs)

if skip_vendor_validation:
warnings.warn(
Expand All @@ -919,12 +927,13 @@ def to_properties(
allow_missing_llm = "llm" in allow_missing_categories
allow_missing_tts = "tts" in allow_missing_categories

turn_detection_config = self._resolve_turn_detection_config()
if not skip_asr_validation and (self._stt is not None or not allow_missing_asr):
base_kwargs["asr"] = self._resolve_asr_config()
base_kwargs["turn_detection"] = self._resolve_turn_detection_config()
base_kwargs["asr"] = self._resolve_asr_config(turn_detection_config)
base_kwargs["turn_detection"] = turn_detection_config

if skip_vendor_validation:
return StartAgentsRequestProperties(**base_kwargs)
return _start_properties_from_mapping(base_kwargs)

if self._tts is None and not (skip_tts_validation or allow_missing_tts):
raise ValueError("TTS configuration is required. Use with_tts() to set it.")
Expand All @@ -937,39 +946,34 @@ def to_properties(
if self._tts is not None and not skip_tts_validation:
base_kwargs["tts"] = self._tts

return StartAgentsRequestProperties(**base_kwargs)
return _start_properties_from_mapping(base_kwargs)

def _resolve_llm_config(self) -> typing.Dict[str, typing.Any]:
llm_config = dict(self._llm or {})
# Agent-level fields take priority over the vendor's defaults.
# This matches the TS SDK where agent-level values override vendor config.
if self._instructions is not None:
if self._instructions is not None and "system_messages" not in llm_config:
llm_config["system_messages"] = [{"role": "system", "content": self._instructions}]
if self._greeting is not None:
if self._greeting is not None and "greeting_message" not in llm_config:
llm_config["greeting_message"] = self._greeting
if self._greeting_configs is not None:
if self._greeting_configs is not None and "greeting_configs" not in llm_config:
llm_config["greeting_configs"] = _dump_optional_model(self._greeting_configs)
if self._failure_message is not None:
if self._failure_message is not None and "failure_message" not in llm_config:
llm_config["failure_message"] = self._failure_message
if self._max_history is not None:
if self._max_history is not None and "max_history" not in llm_config:
llm_config["max_history"] = self._max_history
return llm_config

def _resolve_asr_config(self) -> typing.Dict[str, typing.Any]:
def _resolve_asr_config(self, turn_detection_config: TurnDetectionConfig) -> typing.Dict[str, typing.Any]:
asr_config = dict(self._stt or {})
asr_config.pop("language", None)
if not asr_config:
asr_config["vendor"] = "ares"
asr_config["language"] = self._field_value(turn_detection_config, "language")
return asr_config

def _resolve_turn_detection_config(self) -> TurnDetectionConfig:
existing_stt_language = self._stt.get("language") if self._stt is not None else None
existing_turn_detection_language = self._field_value(self._turn_detection, "language")
language = (
existing_turn_detection_language
if existing_turn_detection_language is not None
else existing_stt_language
if _is_turn_detection_language(existing_stt_language)
else DEFAULT_TURN_DETECTION_LANGUAGE
)
language = _validate_turn_detection_language(language)
Expand Down
73 changes: 58 additions & 15 deletions src/agora_agent/agentkit/agent_session.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,15 @@
AgentThinkAgentManagementResponse as AgentThinkResponse,
)
from ..agents.types.get_turns_agents_response import GetTurnsAgentsResponse
from ..agents.types.start_agents_request_properties import StartAgentsRequestProperties
from .agent import Agent, GetTurnsOptions, SayOptions, ThinkOptions
from .agent import Agent, GetTurnsOptions, SayOptions, ThinkOptions, _start_properties_from_mapping
from .avatar_types import (
is_akool_avatar,
is_anam_avatar,
is_avatar_token_managed,
is_generic_avatar,
is_heygen_avatar,
is_live_avatar_avatar,
is_rtc_avatar,
validate_avatar_config,
validate_tts_sample_rate,
)
Expand Down Expand Up @@ -333,22 +333,63 @@ def _build_start_properties(
properties["tts"] = self._dump_model(self._agent.tts)
if self._agent.llm is not None:
llm = dict(self._agent.llm)
if self._agent.instructions is not None:
if self._agent.instructions is not None and "system_messages" not in llm:
llm["system_messages"] = [{"role": "system", "content": self._agent.instructions}]
if self._agent.greeting is not None:
if self._agent.greeting is not None and "greeting_message" not in llm:
llm["greeting_message"] = self._agent.greeting
if self._agent.greeting_configs is not None:
if self._agent.greeting_configs is not None and "greeting_configs" not in llm:
llm["greeting_configs"] = self._dump_model(self._agent.greeting_configs)
if self._agent.failure_message is not None:
if self._agent.failure_message is not None and "failure_message" not in llm:
llm["failure_message"] = self._agent.failure_message
if self._agent.max_history is not None:
if self._agent.max_history is not None and "max_history" not in llm:
llm["max_history"] = self._agent.max_history
properties["llm"] = llm
if self._agent.stt is not None:
properties["asr"] = self._dump_model(self._agent.stt)

return properties

@staticmethod
def _request_properties_for_start(
resolved_properties: typing.Dict[str, typing.Any],
*,
resolved_preset: typing.Optional[str],
pipeline_id: typing.Optional[str],
) -> typing.Any:
try:
return _start_properties_from_mapping(resolved_properties)
except Exception as exc:
if pipeline_id:
return resolved_properties
if resolved_preset:
normalized_preset = normalize_preset_input(resolved_preset)
if not normalized_preset:
raise
preset_categories = {
category
for item in normalized_preset.split(",")
for category in [get_preset_category(item)]
if category is not None
}
error_categories = _AgentSessionBase._validation_error_categories(exc)
if error_categories and error_categories.issubset(preset_categories):
return resolved_properties
raise

@staticmethod
def _validation_error_categories(exc: Exception) -> typing.Set[str]:
errors = getattr(exc, "errors", None)
if not callable(errors):
return set()
categories: typing.Set[str] = set()
for error in errors():
loc = error.get("loc") if isinstance(error, dict) else None
if isinstance(loc, tuple) and loc:
field = loc[0]
if field in {"asr", "llm", "tts"}:
categories.add(typing.cast(str, field))
return categories

def _vendor_validation_categories(
self,
pipeline_id: typing.Optional[str],
Expand Down Expand Up @@ -513,10 +554,11 @@ def start(self) -> str:
"properties": resolved_properties,
})

try:
request_properties: typing.Any = StartAgentsRequestProperties(**resolved_properties)
except Exception:
request_properties = resolved_properties
request_properties = self._request_properties_for_start(
resolved_properties,
resolved_preset=resolved_preset,
pipeline_id=pipeline_id,
)

response = self._client.agents.start(
self._app_id,
Expand Down Expand Up @@ -840,10 +882,11 @@ async def start(self) -> str:
"properties": resolved_properties,
})

try:
request_properties: typing.Any = StartAgentsRequestProperties(**resolved_properties)
except Exception:
request_properties = resolved_properties
request_properties = self._request_properties_for_start(
resolved_properties,
resolved_preset=resolved_preset,
pipeline_id=pipeline_id,
)

response = await self._client.agents.start(
self._app_id,
Expand Down
Loading
Loading