Skip to content

feat(parakeet-cpp): dynamic batching for concurrent transcription requests#10112

Open
localai-bot wants to merge 6 commits into
masterfrom
feat/parakeet-dynamic-batching
Open

feat(parakeet-cpp): dynamic batching for concurrent transcription requests#10112
localai-bot wants to merge 6 commits into
masterfrom
feat/parakeet-dynamic-batching

Conversation

@localai-bot
Copy link
Copy Markdown
Collaborator

Summary

Adds dynamic batching to the parakeet-cpp backend so concurrent
/v1/audio/transcriptions requests are coalesced into one batched call through
parakeet.cpp's batched encoder. This is a GPU throughput feature: under
concurrent load the batched-GEMM path raises utilization. On CPU it does not
help (the GEMMs already saturate the threads and padding adds work), so it is
disableable.

What changed (all under backend/go/parakeet-cpp/):

  • New in-process batcher (batcher.go): handler goroutines submit requests; one
    dispatcher goroutine accumulates them until batch_max_size or
    batch_max_wait_ms, then makes a single batched engine call. The dispatcher is
    the sole caller of the C engine, so engine access stays single-threaded.
  • The backend drops base.SingleThread (which serialized every call) for
    base.Base, so concurrent AudioTranscription handlers actually run and reach
    the batcher. An engineMu keeps the streaming path and batched-unary mutually
    exclusive on the one shared engine context.
  • AudioTranscription decodes the file, submits to the batcher, and shapes the
    per-item JSON exactly as before (text, word/segment timestamps, tokens).
  • Two model options: batch_max_size (default 8) and batch_max_wait_ms
    (default 15). batch_max_size: 1 disables batching (recommended on CPU).
  • Docs: docs/content/features/audio-to-text.md.

Dependency

Requires the parakeet.cpp side that adds the
parakeet_capi_transcribe_pcm_batch_json C-API (batched transcription with
timestamps) and bumps the ABI to 2. The backend binds that symbol via purego at
runtime, so this Go code builds without it, but the backend image must ship a
libparakeet.so built from that parakeet.cpp branch for the feature to work.

Test plan

  • Pure-Go batcher unit tests pass under -race: go test ./backend/go/parakeet-cpp/ -run TestBatcher -race (coalescing, size trigger, window trigger, size-1 bypass).
  • go build / go vet clean (one pre-existing unrelated unsafe.Pointer warning).
  • End-to-end (reviewer / GPU host): build the backend image with the updated libparakeet.so, then make test-extra-backend-parakeet-cpp-transcription; fire N concurrent transcription requests and confirm correct per-request transcripts, and that throughput improves at batch_max_size > 1 on GPU.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

mudler added 4 commits May 31, 2026 20:10
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…ed JSON C-API

Drop SingleThread; route unary transcription through the in-process batcher
which coalesces concurrent requests into one batched engine call. Streaming
stays mutually exclusive via engineMu. Adds batch_max_size / batch_max_wait_ms
options (size=1 disables; recommended on CPU).

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…eallocate; clarify stream lock

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
… with per-request fallback

The batched JSON C-API symbol exists only in newer libparakeet.so (ABI >= 2);
probe it with Dlsym and register optionally so the backend still loads against
an older library, falling back to per-request transcription. Rewrites the
batcher unit tests as Ginkgo/Gomega specs (forbidigo bans t.Fatal in tests).

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler force-pushed the feat/parakeet-dynamic-batching branch from a39a3b8 to 1c9bff2 Compare May 31, 2026 20:10
mudler added 2 commits May 31, 2026 20:52
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Dynamic batching now defaults off (batch_max_size:1, one request at a
time). Raise batch_max_size to opt in: it is a large throughput win on
GPU under concurrent load, but on CPU and low-concurrency setups it only
adds latency, so off is the safer default. The startup log now states
whether batching is on or off, and the audio-to-text docs are updated to
match.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
@mudler mudler force-pushed the feat/parakeet-dynamic-batching branch from 795d2ed to 27d7d0d Compare June 1, 2026 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants