feat: add smallest ai asr and tts extensions#2200
Draft
harshitajain165 wants to merge 1 commit into
Draft
Conversation
Add two new vendor extensions for Smallest AI: - smallest_asr_python: real-time speech-to-text via the Pulse live WebSocket API (binary PCM16 in, interim/final transcripts out, finalize control message, reconnect with exponential backoff). - smallest_tts_python: text-to-speech via the Lightning SSE streaming endpoint (base64 PCM16 chunks decoded on the fly, ~100 ms to first audio, output_format pinned to pcm). Both include mocked extension-local test suites, test configs, and README docs. Configured via params.api_key or SMALLEST_API_KEY.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds two new vendor extensions for Smallest AI:
smallest_asr_python— real-time speech-to-text using thePulse
live WebSocket API (
wss://api.smallest.ai/waves/v1/stt/live). 38 languages,64 ms time-to-first-transcript.
smallest_tts_python— text-to-speech using theLightning
SSE streaming endpoint (
/waves/v1/tts/live). ~100 ms to first audio chunk,12 languages, voice cloning;
lightning_v3.1andlightning_v3.1_promodels.Implementation notes
ASR (extends
AsyncASRBaseExtension):final=false, finals onPulse's
is_final.asr_finalizemaps to Pulse's{"type": "finalize"}control message (sessionstays open).
word_timestamps=trueby default so final results carry accuratestart_ms/duration_ms.TTS (extends
AsyncTTS2HttpExtension):data: {"audio": "<base64>"}/{"done": true}) decodedincrementally; a partial line is never split across chunk boundaries.
output_formatpinned topcm: Lightning's raw PCM is already signed 16-bitLE mono, matching the
pcm_framecontract with no conversion, and skipping thecontainer header keeps time-to-first-audio low.
invalid_api_keyclassified asINVALID_KEY_ERROR(fatal);everything else non-fatal.
Both authenticate via
params.api_key/SMALLEST_API_KEYand sendX-Source: ten-frameworkfor API-side attribution. All otherparamskeys passthrough verbatim (query string for ASR, request body for TTS).
Testing
ASR — result shape, finalize, dump, metrics, reconnect, vendor error, invalid
params; TTS — basic/dump/flush, error classification, metrics, params/URL
resolution, robustness, state machine.
python -m black --check --line-length 80clean; syntax verified.task asr-guarder-test EXTENSION=smallest_asr_python(pending — will runbefore marking ready for review)
task tts-guarder-test EXTENSION=smallest_tts_python(pending)