Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add TTS Latency Measurement Tools
This PR adds tools for measuring how quickly Deepgram's TTS service delivers audio—and whether it delivers audio fast enough for smooth, uninterrupted playback.
What's included
stream_tts.pyanalyze_tts_latency.pyREADME.mdWhy two separate scripts?
Customers can send us their raw timing data (the JSON file) so we can analyze their exact results and compare against our own tests. This makes debugging latency issues much easier.
What the metrics mean
TTFB (Time To First Byte)
How long until audio starts playing?
This is the delay between sending text and receiving the first audio. Lower is better—users notice delays over ~200ms.
RTF (Real-Time Factor)
Is audio arriving fast enough?
RTF compares delivery speed to playback speed. If a 10-second audio clip arrives in 5 seconds, that's 2.0x RTF.
Min Cumulative RTF (Streaming Health)
Did the stream ever fall behind?
This is the most important metric for real-world playback. It tracks whether, at any moment during the stream, we had received enough audio to keep playing without interruption.
Jitter
How consistent is the delivery?
Even if audio arrives fast enough on average, inconsistent packet timing can cause problems. Jitter measures this variability—lower is better.
Example output
Example files
Technical details: How calculations are performed
Timestamps collected
session_startconnected_attext_sent_atSpeakmessageflush_sent_atFlushmessagepackets[].received_atflushed_received_atFlushedcontrol message arrivessession_endAll timestamps are UTC ISO 8601 format.
Audio duration calculation
Where
bytes_per_sampleis 2 for linear16, 1 for mulaw/alaw. All audio is mono.Formulas
TTFB:
RTF:
Cumulative RTF (calculated for each packet after the first):
The minimum value across all packets determines streaming health.
Jitter:
Usage