fix(sarvam-tts): add keepalive ping and stale connection detection to prevent audible silence#4971
Conversation
- Add keepalive_task that sends ping every 10s per Sarvam API spec to prevent idle WS closure - Add stale connection check (ws.closed / ws.close_code) before sending config - Fix pitch range: -20/+20 -> -0.75/+0.75 per Sarvam docs - Fix pace range for bulbul:v2: 0.5-2.0 -> 0.3-3.0 per Sarvam docs - Fix speech_sample_rate sent as string per API schema - Fix default sample rate for bulbul:v3-beta: 22050 -> 24000 - Add missing output_audio_codec config parameter Tested in production IVR system. Eliminates audible silence caused by ClientConnectionResetError: Cannot write to closing transport
| if self._keepalive_task and not self._keepalive_task.done(): | ||
| self._keepalive_task.cancel() | ||
| try: | ||
| await self._keepalive_task | ||
| except asyncio.CancelledError: | ||
| pass | ||
| self._keepalive_task = None |
There was a problem hiding this comment.
π΄ Keepalive task is immediately cancelled by send_task, never sending a ping
The keepalive task is started at line 895 and then send_task (created at line 897) immediately cancels it at lines 789-795 as its very first action β before any I/O occurs. Since keepalive_task begins with await asyncio.sleep(KEEPALIVE_INTERVAL) (10 seconds), it never gets a chance to send even a single ping before being cancelled.
Root Cause and Impact
The execution flow is:
self._keepalive_task = asyncio.create_task(keepalive_task(ws))β task is scheduled (line 895)self._send_task = asyncio.create_task(send_task(ws))β task is scheduled (line 897)await asyncio.gather(*tasks)yields to event loop (line 903)send_taskruns and its first action is to cancelself._keepalive_task(lines 789-795)keepalive_taskreceives the cancellation during its initialasyncio.sleep(10)and exits
The keepalive is supposed to "send {"type": "ping"} every 10 seconds while the WebSocket is idle between agent turns" to prevent Sarvam's server from closing idle connections. However, it only exists during an active _run_ws session (not while the connection is idle in the pool), and even then it's cancelled instantly.
After the session ends and the connection is returned to the pool via the connection() context manager at livekit-plugins/livekit-plugins-sarvam/livekit/plugins/sarvam/tts.py:886, no keepalive is running. The connection sits idle in the pool with no protection against server-side idle timeouts β exactly the scenario the keepalive was supposed to prevent.
Impact: The primary stated fix of this PR (keepalive pings) is non-functional. Connections in the pool will still go stale between agent turns. The stale connection check (lines 801-805) provides partial mitigation by detecting dead connections, but at the cost of a reconnection delay that may still cause perceptible latency.
Prompt for agents
The keepalive mechanism needs to be restructured so that pings are actually sent while the WebSocket connection is idle in the pool between agent turns. The current approach of starting keepalive inside _run_ws and immediately cancelling it in send_task is fundamentally flawed.
Option A (recommended): Move keepalive responsibility into the ConnectionPool layer or start a persistent keepalive task when the connection is returned to the pool (in the close_cb or a wrapper). The keepalive should run while the connection sits idle in the pool and be stopped when the connection is taken out of the pool for a new _run_ws session.
Option B (simpler, partial fix): In send_task in file livekit-plugins/livekit-plugins-sarvam/livekit/plugins/sarvam/tts.py, move the keepalive cancellation (lines 789-795) to AFTER the first text chunk is received from word_stream, rather than at the very start of send_task. This would at least keep the connection alive while waiting for text input. You would also need to ensure thread safety (no interleaving of ping and config messages on the WebSocket). However, this still does not protect the connection while it is idle in the pool between _run_ws calls.
Was this helpful? React with π or π to provide feedback.
Problem
In production IVR systems, the Sarvam TTS plugin causes audible silence in two scenarios:
Both result in ClientConnectionResetError: Cannot write to closing transport, causing a 0.5s silence gap while the plugin retries with a fresh connection.
Fix
Keepalive ping β sends {"type": "ping"} every 10 seconds while the WebSocket is idle between agent turns, as per the official Sarvam WebSocket API spec. This prevents Sarvam server from closing the connection due to inactivity.
Stale connection check β checks ws.closed and ws.close_code before attempting to write. This detects dead pool connections immediately and forces a fresh connection instead of failing mid-write.
Additional fixes based on Sarvam API documentation
Testing
Validated in a production IVR system handling real concurrent calls. The ClientConnectionResetError no longer occurs after applying both fixes.