Bug Description
When requesting audio in opus format, the output is truncated by approximately 1.5–2 seconds at the end. The same text rendered as mp3 produces the correct full-length audio.
Reproduction
Send identical text to the TTS endpoint requesting both opus and mp3 output formats and compare durations:
# Opus output
curl -s -X POST http://localhost:8880/v1/audio/speech \
-H 'Content-Type: application/json' \
-d '{"model":"kokoro","input":"One two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen twenty.","voice":"af_heart","response_format":"opus"}' \
--output test.opus
# MP3 output
curl -s -X POST http://localhost:8880/v1/audio/speech \
-H 'Content-Type: application/json' \
-d '{"model":"kokoro","input":"One two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen twenty.","voice":"af_heart","response_format":"mp3"}' \
--output test.mp3
ffprobe -v quiet -show_entries format=duration -of csv=p=0 test.opus # → 8.0s
ffprobe -v quiet -show_entries format=duration -of csv=p=0 test.mp3 # → 9.6s
Root Cause
In api/src/services/streaming_audio_writer.py, the write_chunk(finalize=True) path reads the output buffer before closing the OGG container:
# Current (buggy) order:
packets = self.stream.encode(None)
for packet in packets:
self.container.mux(packet)
data = self.output_buffer.getvalue() # ← reads buffer HERE
self.close() # ← container.close() writes final OGG pages AFTER
return data # ← missing the OGG trailer
For opus (OGG container), container.close() writes the final OGG pages (container trailer/footer). Since getvalue() is called first, those final pages are silently lost. MP3 is frame-based with no container trailer, so it is unaffected.
Fix
Close the container before reading the buffer:
# Fixed order:
self.container.close()
data = self.output_buffer.getvalue()
self.output_buffer.close()
return data
Environment
- Image:
ghcr.io/remsky/kokoro-fastapi-cpu:latest
- Version: v0.2.4-master (pulled 2026-02-15)
Bug Description
When requesting audio in
opusformat, the output is truncated by approximately 1.5–2 seconds at the end. The same text rendered asmp3produces the correct full-length audio.Reproduction
Send identical text to the TTS endpoint requesting both
opusandmp3output formats and compare durations:Root Cause
In
api/src/services/streaming_audio_writer.py, thewrite_chunk(finalize=True)path reads the output buffer before closing the OGG container:For opus (OGG container),
container.close()writes the final OGG pages (container trailer/footer). Sincegetvalue()is called first, those final pages are silently lost. MP3 is frame-based with no container trailer, so it is unaffected.Fix
Close the container before reading the buffer:
Environment
ghcr.io/remsky/kokoro-fastapi-cpu:latest