feat(parakeet-cpp): dynamic batching for concurrent transcription requests by localai-bot · Pull Request #10112 · mudler/LocalAI

localai-bot · 2026-05-31T19:29:16Z

Summary

Adds dynamic batching to the parakeet-cpp backend so concurrent
/v1/audio/transcriptions requests are coalesced into one batched call through
parakeet.cpp's batched encoder. This is a GPU throughput feature: under
concurrent load the batched-GEMM path raises utilization. On CPU it does not
help (the GEMMs already saturate the threads and padding adds work), so it is
disableable.

What changed (all under backend/go/parakeet-cpp/):

New in-process batcher (batcher.go): handler goroutines submit requests; one
dispatcher goroutine accumulates them until batch_max_size or
batch_max_wait_ms, then makes a single batched engine call. The dispatcher is
the sole caller of the C engine, so engine access stays single-threaded.
The backend drops base.SingleThread (which serialized every call) for
base.Base, so concurrent AudioTranscription handlers actually run and reach
the batcher. An engineMu keeps the streaming path and batched-unary mutually
exclusive on the one shared engine context.
AudioTranscription decodes the file, submits to the batcher, and shapes the
per-item JSON exactly as before (text, word/segment timestamps, tokens).
Two model options: batch_max_size (default 8) and batch_max_wait_ms
(default 15). batch_max_size: 1 disables batching (recommended on CPU).
Docs: docs/content/features/audio-to-text.md.

Dependency

Requires the parakeet.cpp side that adds the
parakeet_capi_transcribe_pcm_batch_json C-API (batched transcription with
timestamps) and bumps the ABI to 2. The backend binds that symbol via purego at
runtime, so this Go code builds without it, but the backend image must ship a
libparakeet.so built from that parakeet.cpp branch for the feature to work.

Test plan

Pure-Go batcher unit tests pass under -race: go test ./backend/go/parakeet-cpp/ -run TestBatcher -race (coalescing, size trigger, window trigger, size-1 bypass).
go build / go vet clean (one pre-existing unrelated unsafe.Pointer warning).
End-to-end (reviewer / GPU host): build the backend image with the updated libparakeet.so, then make test-extra-backend-parakeet-cpp-transcription; fire N concurrent transcription requests and confirm correct per-request transcripts, and that throughput improves at batch_max_size > 1 on GPU.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

…ed JSON C-API Drop SingleThread; route unary transcription through the in-process batcher which coalesces concurrent requests into one batched engine call. Streaming stays mutually exclusive via engineMu. Adds batch_max_size / batch_max_wait_ms options (size=1 disables; recommended on CPU). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

…eallocate; clarify stream lock Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

… with per-request fallback The batched JSON C-API symbol exists only in newer libparakeet.so (ABI >= 2); probe it with Dlsym and register optionally so the backend still loads against an older library, falling back to per-request transcription. Rewrites the batcher unit tests as Ginkgo/Gomega specs (forbidigo bans t.Fatal in tests). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Dynamic batching now defaults off (batch_max_size:1, one request at a time). Raise batch_max_size to opt in: it is a large throughput win on GPU under concurrent load, but on CPU and low-concurrency setups it only adds latency, so off is the safer default. The startup log now states whether batching is on or off, and the audio-to-text docs are updated to match. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]

mudler added 4 commits May 31, 2026 20:10

feat(parakeet-cpp): dynamic-batching scheduler (queue + dispatcher)

e0188ea

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

fix(parakeet-cpp): tear down dispatcher in Free; log batch config; pr…

c05265d

…eallocate; clarify stream lock Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler force-pushed the feat/parakeet-dynamic-batching branch from a39a3b8 to 1c9bff2 Compare May 31, 2026 20:10

mudler added 2 commits May 31, 2026 20:52

feat(parakeet-cpp): debug-log coalesced batch size in runBatch

c1bb48e

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler force-pushed the feat/parakeet-dynamic-batching branch from 795d2ed to 27d7d0d Compare June 1, 2026 12:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(parakeet-cpp): dynamic batching for concurrent transcription requests#10112

feat(parakeet-cpp): dynamic batching for concurrent transcription requests#10112
localai-bot wants to merge 6 commits into
masterfrom
feat/parakeet-dynamic-batching

localai-bot commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented May 31, 2026

Summary

Dependency

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants