Adding GnaniAI STT plugin#5769
Conversation
β¦in livekit-agents
β¦nd streaming updates
β¦gement in STT and TTS
β¦ management in TTS
β¦n options in TTS
c45bd9a to
c6b921b
Compare
| if audio_buffer: | ||
| await ws.send(bytes(audio_buffer)) | ||
|
|
||
| await ws.close() |
There was a problem hiding this comment.
π΄ STT _send_audio calls ws.close() instead of application-level end signal, risking dropped final transcriptions
After sending all audio data, _send_audio at line 353 calls await ws.close(), which initiates a WebSocket transport-level close handshake. This runs concurrently with _recv_messages (which is reading transcription results). If the Gnani server hasn't finished processing the final audio chunks when it receives the close frame, it may acknowledge the close without sending the remaining transcription results β causing the user's final words to be silently dropped.
Pattern comparison with Deepgram plugin
The established pattern in this codebase is to send an application-level end-of-stream message, not a transport-level close. For example, Deepgram's send_task (livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/stt.py:548-549) sends {"type": "CloseStream"} via ws.send_str(), which tells the server "I'm done sending audio, please send remaining results." The server then sends all pending transcriptions and closes the connection on its side. The Gnani plugin should either use a similar application-level signal (if the Gnani API supports one) or simply return without calling ws.close() and let the async with websockets.connect() as ws: context manager handle the close after _recv_messages finishes naturally.
Prompt for agents
In stt.py, the _send_audio method calls `await ws.close()` on line 353 after sending all audio. This initiates a WebSocket close handshake which may cause the server to stop processing before sending final transcription results.
The fix depends on the Gnani API protocol:
1. If the Gnani STT WebSocket API supports an application-level end-of-stream message (e.g. a JSON message like {"type": "end"} or similar), send that instead of ws.close(). Then let _recv_messages finish naturally when the server closes the connection.
2. If no such message exists, simply remove the `await ws.close()` call and let the _send_audio coroutine return. The `async with websockets.connect() as ws:` context manager in _run() will handle the close after both send_task and recv_task complete. But you would also need _recv_messages to have a way to know when to stop (e.g. the server closing the connection, or a timeout).
The key files are:
- livekit-plugins/livekit-plugins-gnani/livekit/plugins/gnani/stt.py: _send_audio method (line 331-353) and _recv_messages method (line 355-414)
- For reference pattern: livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/stt.py: send_task (lines 518-549) which sends an application-level CloseStream message
Was this helpful? React with π or π to provide feedback.
We have integrated Gnani's Speech-to-Text with LiveKit Agents. The service provides low-latency, high-accuracy transcription for Indian languages and accents, supporting English (Indian), Hindi, Tamil, Telugu, and more. It offers secure API keyβbased authentication and real-time transcription optimized for conversational and voice-based applications.