Skip to content

feat: add smallest ai asr and tts extensions#2200

Draft
harshitajain165 wants to merge 1 commit into
TEN-framework:mainfrom
harshitajain165:feat/smallest-ai-integration
Draft

feat: add smallest ai asr and tts extensions#2200
harshitajain165 wants to merge 1 commit into
TEN-framework:mainfrom
harshitajain165:feat/smallest-ai-integration

Conversation

@harshitajain165

Copy link
Copy Markdown

What

Adds two new vendor extensions for Smallest AI:

  • smallest_asr_python — real-time speech-to-text using the
    Pulse
    live WebSocket API (wss://api.smallest.ai/waves/v1/stt/live). 38 languages,
    64 ms time-to-first-transcript.
  • smallest_tts_python — text-to-speech using the
    Lightning
    SSE streaming endpoint (/waves/v1/tts/live). ~100 ms to first audio chunk,
    12 languages, voice cloning; lightning_v3.1 and lightning_v3.1_pro models.

Implementation notes

ASR (extends AsyncASRBaseExtension):

  • Streams raw PCM16 binary frames; interim results with final=false, finals on
    Pulse's is_final.
  • asr_finalize maps to Pulse's {"type": "finalize"} control message (session
    stays open).
  • word_timestamps=true by default so final results carry accurate
    start_ms/duration_ms.
  • Reconnect with exponential backoff (5 attempts) before reporting a fatal error.

TTS (extends AsyncTTS2HttpExtension):

  • SSE frames (data: {"audio": "<base64>"} / {"done": true}) decoded
    incrementally; a partial line is never split across chunk boundaries.
  • output_format pinned to pcm: Lightning's raw PCM is already signed 16-bit
    LE mono, matching the pcm_frame contract with no conversion, and skipping the
    container header keeps time-to-first-audio low.
  • 401/403 and invalid_api_key classified as INVALID_KEY_ERROR (fatal);
    everything else non-fatal.

Both authenticate via params.api_key / SMALLEST_API_KEY and send
X-Source: ten-framework for API-side attribution. All other params keys pass
through verbatim (query string for ASR, request body for TTS).

Testing

  • Extension-local mocked test suites included for both (no API key needed):
    ASR — result shape, finalize, dump, metrics, reconnect, vendor error, invalid
    params; TTS — basic/dump/flush, error classification, metrics, params/URL
    resolution, robustness, state machine.
  • python -m black --check --line-length 80 clean; syntax verified.
  • task asr-guarder-test EXTENSION=smallest_asr_python (pending — will run
    before marking ready for review)
  • task tts-guarder-test EXTENSION=smallest_tts_python (pending)
  • End-to-end voice-assistant graph smoke test (pending)

Add two new vendor extensions for Smallest AI:

- smallest_asr_python: real-time speech-to-text via the Pulse live
  WebSocket API (binary PCM16 in, interim/final transcripts out,
  finalize control message, reconnect with exponential backoff).
- smallest_tts_python: text-to-speech via the Lightning SSE streaming
  endpoint (base64 PCM16 chunks decoded on the fly, ~100 ms to first
  audio, output_format pinned to pcm).

Both include mocked extension-local test suites, test configs, and
README docs. Configured via params.api_key or SMALLEST_API_KEY.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant