Skip to content

fix(sarvam-tts): add keepalive ping and stale connection detection to prevent audible silence#4971

Open
sanskar1104srivastava wants to merge 4 commits intolivekit:mainfrom
sanskar1104srivastava:fix/sarvam-tts-ws-stale-connection-keepalive
Open

fix(sarvam-tts): add keepalive ping and stale connection detection to prevent audible silence#4971
sanskar1104srivastava wants to merge 4 commits intolivekit:mainfrom
sanskar1104srivastava:fix/sarvam-tts-ws-stale-connection-keepalive

Conversation

@sanskar1104srivastava
Copy link

Problem

In production IVR systems, the Sarvam TTS plugin causes audible silence in two scenarios:

  1. Between agent turns β€” Sarvam server closes idle WebSocket connections, and the pool returns a stale connection on the next turn.
  2. At the start of a new call β€” the pool reuses a WebSocket connection from a previous ended call that Sarvam has already closed server-side.

Both result in ClientConnectionResetError: Cannot write to closing transport, causing a 0.5s silence gap while the plugin retries with a fresh connection.

Fix

  1. Keepalive ping β€” sends {"type": "ping"} every 10 seconds while the WebSocket is idle between agent turns, as per the official Sarvam WebSocket API spec. This prevents Sarvam server from closing the connection due to inactivity.

  2. Stale connection check β€” checks ws.closed and ws.close_code before attempting to write. This detects dead pool connections immediately and forces a fresh connection instead of failing mid-write.

Additional fixes based on Sarvam API documentation

  • pitch validation range corrected from -20/+20 to -0.75/+0.75
  • pace validation range for bulbul:v2 corrected from 0.5-2.0 to 0.3-3.0
  • speech_sample_rate now sent as string, matching the API schema which defines it as a string enum
  • default sample rate for bulbul:v3-beta corrected from 22050 to 24000 Hz
  • missing output_audio_codec config parameter added with full validation

Testing

Validated in a production IVR system handling real concurrent calls. The ClientConnectionResetError no longer occurs after applying both fixes.

- Add keepalive_task that sends ping every 10s per Sarvam API spec to prevent idle WS closure
- Add stale connection check (ws.closed / ws.close_code) before sending config
- Fix pitch range: -20/+20 -> -0.75/+0.75 per Sarvam docs
- Fix pace range for bulbul:v2: 0.5-2.0 -> 0.3-3.0 per Sarvam docs
- Fix speech_sample_rate sent as string per API schema
- Fix default sample rate for bulbul:v3-beta: 22050 -> 24000
- Add missing output_audio_codec config parameter

Tested in production IVR system. Eliminates audible silence caused by
ClientConnectionResetError: Cannot write to closing transport
@CLAassistant
Copy link

CLAassistant commented Feb 28, 2026

CLA assistant check
All committers have signed the CLA.

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 10 additional findings in Devin Review.

Open in Devin Review

Comment on lines +789 to +795
if self._keepalive_task and not self._keepalive_task.done():
self._keepalive_task.cancel()
try:
await self._keepalive_task
except asyncio.CancelledError:
pass
self._keepalive_task = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ”΄ Keepalive task is immediately cancelled by send_task, never sending a ping

The keepalive task is started at line 895 and then send_task (created at line 897) immediately cancels it at lines 789-795 as its very first action β€” before any I/O occurs. Since keepalive_task begins with await asyncio.sleep(KEEPALIVE_INTERVAL) (10 seconds), it never gets a chance to send even a single ping before being cancelled.

Root Cause and Impact

The execution flow is:

  1. self._keepalive_task = asyncio.create_task(keepalive_task(ws)) β€” task is scheduled (line 895)
  2. self._send_task = asyncio.create_task(send_task(ws)) β€” task is scheduled (line 897)
  3. await asyncio.gather(*tasks) yields to event loop (line 903)
  4. send_task runs and its first action is to cancel self._keepalive_task (lines 789-795)
  5. keepalive_task receives the cancellation during its initial asyncio.sleep(10) and exits

The keepalive is supposed to "send {"type": "ping"} every 10 seconds while the WebSocket is idle between agent turns" to prevent Sarvam's server from closing idle connections. However, it only exists during an active _run_ws session (not while the connection is idle in the pool), and even then it's cancelled instantly.

After the session ends and the connection is returned to the pool via the connection() context manager at livekit-plugins/livekit-plugins-sarvam/livekit/plugins/sarvam/tts.py:886, no keepalive is running. The connection sits idle in the pool with no protection against server-side idle timeouts β€” exactly the scenario the keepalive was supposed to prevent.

Impact: The primary stated fix of this PR (keepalive pings) is non-functional. Connections in the pool will still go stale between agent turns. The stale connection check (lines 801-805) provides partial mitigation by detecting dead connections, but at the cost of a reconnection delay that may still cause perceptible latency.

Prompt for agents
The keepalive mechanism needs to be restructured so that pings are actually sent while the WebSocket connection is idle in the pool between agent turns. The current approach of starting keepalive inside _run_ws and immediately cancelling it in send_task is fundamentally flawed.

Option A (recommended): Move keepalive responsibility into the ConnectionPool layer or start a persistent keepalive task when the connection is returned to the pool (in the close_cb or a wrapper). The keepalive should run while the connection sits idle in the pool and be stopped when the connection is taken out of the pool for a new _run_ws session.

Option B (simpler, partial fix): In send_task in file livekit-plugins/livekit-plugins-sarvam/livekit/plugins/sarvam/tts.py, move the keepalive cancellation (lines 789-795) to AFTER the first text chunk is received from word_stream, rather than at the very start of send_task. This would at least keep the connection alive while waiting for text input. You would also need to ensure thread safety (no interleaving of ping and config messages on the WebSocket). However, this still does not protect the connection while it is idle in the pool between _run_ws calls.
Open in Devin Review

Was this helpful? React with πŸ‘ or πŸ‘Ž to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants