feat(tts): add gradium tts support#2193
Conversation
TTS guarder + standalone test records (commit 931cc24)Addressed both review comments on
TTS guarder —
|
| Case | Result |
|---|---|
| test_append_input | ✅ passed |
| test_append_input_stress | ✅ passed |
| test_append_input_without_text_input_end | ✅ passed |
| test_append_interrupt | ✅ passed |
| test_basic_audio_setting | ✅ passed |
| test_corner_input | ✅ passed |
| test_dump | ✅ passed |
| test_dump_each_request_id | ✅ passed |
| test_empty_text_request | ✅ passed |
| test_flush | ✅ passed |
| test_interleaved_requests | ✅ passed |
| test_invalid_required_params | ✅ passed |
| test_invalid_text_handling | ✅ passed |
| test_metrics | ✅ passed |
| test_miss_required_params | ✅ passed |
| test_subtitle_alignment | ⏭️ skipped (disabled by default; optional — only for providers that emit word/segment timing, which Gradium does not) |
Standalone suite — task test-extension EXTENSION=...gradium_tts_python
28 passed (includes a new test_output_format_derived_from_sample_rate_only that locks in: a user-supplied output_format is ignored, sample_rate drives the derived pcm_<rate>, and it is not leaked as a vendor passthrough param).
Both runs were executed on the branch code; black format-check is clean.
Code Review: Gradium TTS extensionThanks for the thorough work here — the per-request socket model, the single finalize path, and the test suite mirroring the sibling extensions are all solid. A few things worth a look before merge. Potential bugs1. For a strictly FIFO base class these line up, but the design is fragile:
What's the queue actually buying you over just using the 2. 3. Clean-close detection relies on 4. Code quality
Performance
Security
TestsGood coverage of batching, interleaving, metrics-once, and the state machine. Gaps:
Overall this is close. The |
Review: feat(tts): add gradium tts supportSolid, well-structured extension that closely mirrors the sibling TTS extensions. The per-request socket model, request-scoped Potential bugs
Code quality
Security
TestsGood coverage across basic audio, flush, state machine, params, metrics, and error handling, plus the gradium-specific interleaving and
The follow-ups you listed (punctuation/list robustness regressions, explicit 48 kHz coverage) are reasonable to defer. Nice work overall. |
Update — commits
|
|
I don't think Gradium should batch all text segments with the same A few reasons:
|
|
Review: feat(tts): add gradium tts support Solid extension. It follows the sibling-TTS pattern, the per-request streaming session model is clean, and test coverage is good. A few issues worth addressing before merge. Bugs / correctness
Robustness
Style / minor
Test coverage Coverage is strong — basic audio, dump, flush, segmented sessions, metrics (incl. once-per-request TTFB), params passthrough, sample-rate derivation, robustness (empty/whitespace/punctuation/long/special), and auth-error classification. Two gaps:
Overall this is close. Issues #1 and #2 are the ones I'd want resolved before merge since they affect real graph configs, not just internals. |
Update — stream segments immediately (commit
|
Gradium previously accumulated same-request_id segments into a local buffer and only flushed on a sentence/size threshold or text_input_end. Combined with the base class re-queuing interleaved messages, this could drop sections of TTS audio and required a non-standard on_data override. Forward each segment to the vendor as it arrives over one persistent per-request websocket session (start_session/send_text/end_input plus a concurrent audio reader), matching Gradium's LLM-to-TTS streaming guidance and the other websocket TTS extensions. The base class now owns queuing/ordering, so the on_data override and ingress_messages are removed. Also in this change: - parse json_config (a JSON string per the manifest schema) into an object before sending it on the wire, so manifest, config and README agree. - cancel the background reader task on a send/setup failure so it cannot outlive the request. - read the websockets-14 close code from exc.rcvd/exc.sent (it is no longer a top-level exc.code) so a clean 1000 close is treated as end-of-stream. Tests rewritten to the streaming API, with added coverage for immediate forwarding, single-session-per-request, json_config parsing, reader-task cancellation, and punctuation-only input.
89da4f7 to
cb1a38c
Compare
Review:
|
Capture the gates that were easy to miss this round: the commitlint body-max-line-length rule (there is no local commit-msg hook, so CI is the first thing to catch a long body line), and a single pre-push checklist covering black, lint, standalone tests, guarder tests and commit messages. Also note the stale .ten/ gotcha: a leftover standalone install makes task check report spurious reformatting and breaks the next install.
Review: feat(tts): add gradium tts supportSolid, well-structured extension. It uses the correct websockets 14 API ( A few things worth considering before merge. Potential issues1. raw_msg = await asyncio.wait_for(self.ws.recv(), timeout=WS_RECV_TIMEOUT)But the config exposes 2. Passthrough params can clobber reserved setup fields. if self.config.voice_id:
payload["voice_id"] = self.config.voice_id
...
for key, value in self.config.params.items():
payload[key] = valueA passthrough param named 3. Possible orphaned reader task on request switch without self._reader_task = asyncio.create_task(self._read_audio(t.request_id))In the normal flow the prior request ends via Minor
Test coverageStrong. Covers basic audio, flush, dump byte-for-byte comparison, segmented/single-TTFB metrics, params passthrough + Overall this is in good shape — items 1–3 are the ones I'd want addressed or consciously waived before merge. |
Add a pyproject.toml for the gradium_tts_python extension (project metadata plus the websockets runtime dependency, version kept in sync with the manifest) and list it in the manifest package include so it ships with the package. This is prep for switching to the full uv toolchain, which expects each extension to carry its own pyproject.toml.
Review:
|
Code Review:
|
Code Review — feat(tts): add gradium tts supportSolid, well-tested extension. The refactor to a per-request streaming session with a background reader task and a single finalize path is clean, the config normalization + api-key redaction is careful, and the websockets>=14 close-code handling ( A few things worth addressing before merge. Bugs / correctness1. Example graph uses 2. 3. Stale reader task on back-to-back requests (potential ordering hazard). Minor
SecurityLooks good — Nice work overall. Items 1–3 are the ones I'd want resolved (1 is a quick fix; 2 and 3 relate to lifecycle correctness under interrupt / rapid-turn scenarios). |
Summary
Adds the Gradium TTS extension (
gradium_tts_python), a websocket streaming TTS built onAsyncTTS2BaseExtension, plus avoice_assistant_gradiumexample graph.gradium_tts.py): websocket setup →ready→text/end_of_stream→audio/error, with auth-error classification (401/403/1008 → fatal).extension.py): streams audio through a single finalize path (_finalize_request→finish_request); request-level TTFB metric (emitted once per request even across multiple vendor segments); audio framing prefers the server-readysample rate, falling back to config.config.py): pydantic model with param normalization, output-format/sample-rate reconciliation, and api-key redaction in logs.voice_assistant_gradiumpredefined graph (deepgram ASR + openai LLM + gradium TTS), manifest dependency, regenerated lockfile.Tests
Standalone suite mirrors the sibling TTS extensions (basic audio, flush, state machine, robustness, params, metrics, error handling), plus gradium-specific coverage for its text-batching and per-request socket model.
task test-extension EXTENSION=agents/ten_packages/extension/gradium_tts_python→ 27 passedtask tts-guarder-test EXTENSION=gradium_tts_python CONFIG_DIR=tests/configs→ 15 passed, 1 skippedFollow-ups (non-blocking)
pcm_48000.