added new tts latency scripts #7

jjmaldonis · 2026-01-26T15:51:25Z

Add TTS Latency Measurement Tools

This PR adds tools for measuring how quickly Deepgram's TTS service delivers audio—and whether it delivers audio fast enough for smooth, uninterrupted playback.

What's included

File	Description
`stream_tts.py`	Sends text to Deepgram and records when each piece of audio arrives
`analyze_tts_latency.py`	Analyzes the timing data and produces a report
`README.md`	Setup and usage instructions

Why two separate scripts?

Customers can send us their raw timing data (the JSON file) so we can analyze their exact results and compare against our own tests. This makes debugging latency issues much easier.

What the metrics mean

TTFB (Time To First Byte)

How long until audio starts playing?

This is the delay between sending text and receiving the first audio. Lower is better—users notice delays over ~200ms.

TTFB: Time from sending text to first audio (excludes connection setup)
TTFB including network: Time from the very start, including establishing the connection (matters for cold starts)

RTF (Real-Time Factor)

Is audio arriving fast enough?

RTF compares delivery speed to playback speed. If a 10-second audio clip arrives in 5 seconds, that's 2.0x RTF.

> 1.0x: Audio arrives faster than it plays—good!
= 1.0x: Audio arrives exactly as fast as it plays—just barely keeping up
< 1.0x: Audio arrives slower than it plays—playback will stutter

Min Cumulative RTF (Streaming Health)

Did the stream ever fall behind?

This is the most important metric for real-world playback. It tracks whether, at any moment during the stream, we had received enough audio to keep playing without interruption.

≥ 1.0x: Stream always stayed ahead—smooth playback ✓
< 1.0x: Stream fell behind at some point—playback would have stuttered ✗

Jitter

How consistent is the delivery?

Even if audio arrives fast enough on average, inconsistent packet timing can cause problems. Jitter measures this variability—lower is better.

Example output

$ uv run analyze_tts_latency.py -i phrases_internet_troubleshooting.json
======================================================================
DEEPGRAM TTS LATENCY ANALYSIS REPORT
======================================================================
SESSION OVERVIEW
----------------------------------------
  Duration:        8965.78 ms
  Phrases:         5
  Total packets:   455
  Total audio:     18200.00 ms
  Total bytes:     873,600
LATENCY
----------------------------------------
  TTFB:            161.67 ms
  TTFB (incl net): 622.37 ms
  TTLB:            8502.82 ms
  Overall RTF:     2.18x
STREAMING HEALTH
----------------------------------------
  (Min cumulative RTF >= 1.0 means stream never fell behind real-time)
  Min cumulative RTF: 2.12x
  Status:          ✓ Stream kept ahead of real-time
JITTER (Inter-Arrival Time Variability)
----------------------------------------
  Mean IAT:        18.02 ms
  Jitter (σ):      2.21 ms

Example files

Input phrases: phrases_internet_troubleshooting.txt
Output JSON: phrases_internet_troubleshooting.json
Output audio: phrases_internet_troubleshooting.wav

Technical details: How calculations are performed

Timestamps collected

Timestamp	Description
`session_start`	Before websocket connection attempt
`connected_at`	After websocket connection established
`text_sent_at`	Before sending each `Speak` message
`flush_sent_at`	Before sending each `Flush` message
`packets[].received_at`	When each audio packet arrives
`flushed_received_at`	When the `Flushed` control message arrives
`session_end`	After connection closes

All timestamps are UTC ISO 8601 format.

Audio duration calculation

audio_duration = (byte_size / bytes_per_sample) / sample_rate

Where bytes_per_sample is 2 for linear16, 1 for mulaw/alaw. All audio is mono.

Formulas

TTFB:

TTFB = first_audio_packet.received_at - first_phrase.text_sent_at
TTFB_incl_net = first_audio_packet.received_at - session_start

RTF:

overall_rtf = total_audio_duration / (last_packet.received_at - first_packet.received_at)

Cumulative RTF (calculated for each packet after the first):

cumulative_rtf = cumulative_audio_received / wall_clock_since_first_packet

The minimum value across all packets determines streaming health.

Jitter:

inter_arrival_time[i] = packet[i].received_at - packet[i-1].received_at
jitter = standard_deviation(all_inter_arrival_times)

Usage

# Collect timing data
export DEEPGRAM_API_KEY="your-api-key"
uv run stream_tts.py -i phrases.txt -o results.json -a output.wav

# Analyze results
uv run analyze_tts_latency.py -i results.json

# Export metrics as JSON
uv run analyze_tts_latency.py -i results.json -o metrics.json

jkroll-deepgram

Awesome! LGTM

added new tts latency scripts

ca9df21

jjmaldonis requested a review from a team as a code owner January 26, 2026 15:51

jkroll-deepgram approved these changes Jan 27, 2026

View reviewed changes

jeniya-DG merged commit cba6262 into main Jan 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added new tts latency scripts #7

added new tts latency scripts #7

jjmaldonis commented Jan 26, 2026

Uh oh!

jkroll-deepgram left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

added new tts latency scripts #7

added new tts latency scripts #7

Conversation

jjmaldonis commented Jan 26, 2026

Add TTS Latency Measurement Tools

What's included

Why two separate scripts?

What the metrics mean

TTFB (Time To First Byte)

RTF (Real-Time Factor)

Min Cumulative RTF (Streaming Health)

Jitter

Example output

Example files

Technical details: How calculations are performed

Timestamps collected

Audio duration calculation

Formulas

Usage

Uh oh!

jkroll-deepgram left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants