Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17,991 changes: 17,989 additions & 2 deletions .fern/replay.lock

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ pip install agora-agents
## Quick Start

Start with the `Agent` builder: create a client with app credentials, choose your ASR, LLM, and TTS providers, then start a session. Omit vendor API keys for supported Agora-managed models, or provide keys when you want BYOK.
Set Agora interaction language with `turn_detection.language`; provider-specific STT language values remain under `asr.params`.
Set Agora interaction language with `turn_detection.language`; provider-specific STT language values remain under `asr.params`. Ares uses only the REST `asr.language` value sourced from `turn_detection.language`.

```python
import os
Expand Down
4 changes: 2 additions & 2 deletions changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/).
### Fixed

- **Managed-provider validation** — AgentKit validation now distinguishes preset-backed providers from BYOK providers so required provider fields are only required when credentials are caller-supplied.
- **Language placement** — Provider-specific STT language values remain under `asr.params`, while Agora interaction language is emitted separately as `turn_detection.language`.
- **Language placement** — Provider-specific STT language values remain under `asr.params`; the REST `asr.language` field is populated from `turn_detection.language`.

## [v2.0.0] — 2026-05-21

Expand Down Expand Up @@ -114,7 +114,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/).

### Fixed

- **`AresSTT`** — Removed redundant `language` key from the `params` dict. Language is now emitted only at the top level. `params` is only included when `additional_params` is provided.
- **`AresSTT`** — Removed redundant `language` key from the `params` dict. Ares only selects the provider; AgentKit populates REST `asr.language` from `turn_detection.language`. `params` is only included when `additional_params` is provided.
- **`OpenAIRealtime` / `VertexAI` (MLLM)** — Agent-level `greeting` and `failure_message` defaults are now correctly applied when missing in MLLM mode. Previously these values were silently dropped.
- **`VertexAI` (MLLM)** — `messages` is emitted at the MLLM top level, matching the generated core SDK contract.

Expand Down
2 changes: 1 addition & 1 deletion docs/concepts/vendors.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ tts = ElevenLabsTTS(

Used with `agent.with_stt()`.

Use `turn_detection.language` for Agora interaction language; it defaults to `en-US`. STT vendor `language` options are serialized under `asr.params` using each provider's own format.
Use `turn_detection.language` for Agora interaction language; it defaults to `en-US`. STT vendor `language` options are serialized under `asr.params` using each provider's own format. Ares does not take a provider language option; AgentKit uses `turn_detection.language` for REST `asr.language`.

| Class | Provider | Required Parameters |
|---|---|---|
Expand Down
4 changes: 2 additions & 2 deletions docs/reference/vendors.md
Original file line number Diff line number Diff line change
Expand Up @@ -318,7 +318,7 @@ The SDK also includes named helpers for the remaining Agora-supported LLM provid

## STT Vendors

Use `turn_detection.language` for Agora interaction language; it defaults to `en-US`. Provider-specific language values remain under `asr.params` and may use a different format.
Use `turn_detection.language` for Agora interaction language; it defaults to `en-US`. Provider-specific language values remain under `asr.params` and may use a different format. AgentKit populates REST `asr.language` from `turn_detection.language`.

### `SpeechmaticsSTT`

Expand All @@ -336,6 +336,7 @@ Use `turn_detection.language` for Agora interaction language; it defaults to `en
| `api_key` | `str` | BYOK only | `None` | Deepgram API key. Optional only for Agora-managed `nova-2` and `nova-3`. |
| `model` | `str` | No | `None` | Model (e.g., `nova-2`) |
| `language` | `str` | No | `None` | Language code (e.g., `en-US`) |
| `interaction_language` | `str` | No | `None` | Agora `asr.language` override |
| `smart_format` | `bool` | No | `None` | Enable smart formatting |
| `punctuation` | `bool` | No | `None` | Enable punctuation |
| `additional_params` | `Dict[str, Any]` | No | `None` | Additional parameters |
Expand Down Expand Up @@ -396,7 +397,6 @@ For `nova-2` and `nova-3`, omit `api_key` to use Agora-managed credentials. For

| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| `language` | `str` | No | `None` | Language code |
| `additional_params` | `Dict[str, Any]` | No | `None` | Additional parameters |

### `SarvamSTT`
Expand Down
24 changes: 10 additions & 14 deletions src/agora_agent/agentkit/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -919,9 +919,10 @@ def to_properties(
allow_missing_llm = "llm" in allow_missing_categories
allow_missing_tts = "tts" in allow_missing_categories

turn_detection_config = self._resolve_turn_detection_config()
if not skip_asr_validation and (self._stt is not None or not allow_missing_asr):
base_kwargs["asr"] = self._resolve_asr_config()
base_kwargs["turn_detection"] = self._resolve_turn_detection_config()
base_kwargs["asr"] = self._resolve_asr_config(turn_detection_config)
base_kwargs["turn_detection"] = turn_detection_config

if skip_vendor_validation:
return StartAgentsRequestProperties(**base_kwargs)
Expand All @@ -941,35 +942,30 @@ def to_properties(

def _resolve_llm_config(self) -> typing.Dict[str, typing.Any]:
llm_config = dict(self._llm or {})
# Agent-level fields take priority over the vendor's defaults.
# This matches the TS SDK where agent-level values override vendor config.
if self._instructions is not None:
if self._instructions is not None and "system_messages" not in llm_config:
llm_config["system_messages"] = [{"role": "system", "content": self._instructions}]
if self._greeting is not None:
if self._greeting is not None and "greeting_message" not in llm_config:
llm_config["greeting_message"] = self._greeting
if self._greeting_configs is not None:
if self._greeting_configs is not None and "greeting_configs" not in llm_config:
llm_config["greeting_configs"] = _dump_optional_model(self._greeting_configs)
if self._failure_message is not None:
if self._failure_message is not None and "failure_message" not in llm_config:
llm_config["failure_message"] = self._failure_message
if self._max_history is not None:
if self._max_history is not None and "max_history" not in llm_config:
llm_config["max_history"] = self._max_history
return llm_config

def _resolve_asr_config(self) -> typing.Dict[str, typing.Any]:
def _resolve_asr_config(self, turn_detection_config: TurnDetectionConfig) -> typing.Dict[str, typing.Any]:
asr_config = dict(self._stt or {})
asr_config.pop("language", None)
if not asr_config:
asr_config["vendor"] = "ares"
asr_config["language"] = self._field_value(turn_detection_config, "language")
return asr_config

def _resolve_turn_detection_config(self) -> TurnDetectionConfig:
existing_stt_language = self._stt.get("language") if self._stt is not None else None
existing_turn_detection_language = self._field_value(self._turn_detection, "language")
language = (
existing_turn_detection_language
if existing_turn_detection_language is not None
else existing_stt_language
if _is_turn_detection_language(existing_stt_language)
else DEFAULT_TURN_DETECTION_LANGUAGE
)
language = _validate_turn_detection_language(language)
Expand Down
11 changes: 6 additions & 5 deletions src/agora_agent/agentkit/agent_session.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
is_generic_avatar,
is_heygen_avatar,
is_live_avatar_avatar,
is_rtc_avatar,
validate_avatar_config,
validate_tts_sample_rate,
)
Expand Down Expand Up @@ -333,15 +334,15 @@ def _build_start_properties(
properties["tts"] = self._dump_model(self._agent.tts)
if self._agent.llm is not None:
llm = dict(self._agent.llm)
if self._agent.instructions is not None:
if self._agent.instructions is not None and "system_messages" not in llm:
llm["system_messages"] = [{"role": "system", "content": self._agent.instructions}]
if self._agent.greeting is not None:
if self._agent.greeting is not None and "greeting_message" not in llm:
llm["greeting_message"] = self._agent.greeting
if self._agent.greeting_configs is not None:
if self._agent.greeting_configs is not None and "greeting_configs" not in llm:
llm["greeting_configs"] = self._dump_model(self._agent.greeting_configs)
if self._agent.failure_message is not None:
if self._agent.failure_message is not None and "failure_message" not in llm:
llm["failure_message"] = self._agent.failure_message
if self._agent.max_history is not None:
if self._agent.max_history is not None and "max_history" not in llm:
llm["max_history"] = self._agent.max_history
properties["llm"] = llm
if self._agent.stt is not None:
Expand Down
5 changes: 4 additions & 1 deletion src/agora_agent/agentkit/presets.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,9 @@ def infer_tts_preset(tts: typing.Optional[typing.Dict[str, typing.Any]]) -> typi
if vendor == "minimax":
if params.get("key"):
return None
return _MINIMAX_MODEL_TO_PRESET.get(_normalize_model_name(params.get("model")) or "")
# Model is no longer in params for the preset path; fall back to the top-level hint.
model = _normalize_model_name(params.get("model")) or _normalize_model_name(tts.get("_minimax_preset_model")) or ""
return _MINIMAX_MODEL_TO_PRESET.get(model)
return None


Expand Down Expand Up @@ -184,6 +186,7 @@ def strip_inferred_preset_fields(properties: typing.Dict[str, typing.Any], infer
params["group_id"] = None
params["url"] = None
tts = {k: v for k, v in {**tts, "params": _omit_none(params)}.items() if v is not None}
tts.pop("_minimax_preset_model", None)

return {**properties, "asr": asr, "llm": llm, "tts": tts}

Expand Down
1 change: 1 addition & 0 deletions src/agora_agent/agentkit/vendors/avatar.py
Original file line number Diff line number Diff line change
Expand Up @@ -206,3 +206,4 @@ def to_config(self) -> Dict[str, Any]:

enable = self.options.enable if self.options.enable is not None else True
return {"enable": enable, "vendor": "anam", "params": params}

10 changes: 8 additions & 2 deletions src/agora_agent/agentkit/vendors/llm.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from typing import Any, Dict, List, Optional
from typing import Any, Dict, List, Optional, Union

from pydantic import BaseModel, ConfigDict, Field, model_validator

Expand Down Expand Up @@ -376,8 +376,14 @@ def to_config(self) -> Dict[str, Any]:
options = _dump_optional_model(self.options)
options.pop("project_id", None)
options.pop("location", None)
if not options.get("url"):
options["url"] = (
f"https://{self.options.location}-aiplatform.googleapis.com/v1/projects/"
f"{self.options.project_id}/locations/{self.options.location}/"
f"publishers/google/models/{self.options.model}:streamGenerateContent?alt=sse"
)
config = Gemini(**options).to_config()
params = dict(config["params"])
params = dict(config.get("params") or {})
params["project_id"] = self.options.project_id
params["location"] = self.options.location
config["params"] = params
Expand Down
1 change: 1 addition & 0 deletions src/agora_agent/agentkit/vendors/mllm.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import warnings
from typing import Any, Dict, List, Optional

from pydantic import BaseModel, ConfigDict, Field
Expand Down
Loading
Loading