feat(supertonic): add Supertonic ONNX TTS backend (CPU) by localai-bot · Pull Request #10342 · mudler/LocalAI

localai-bot · 2026-06-15T11:16:36Z

What

Adds a native Go gRPC TTS backend supertonic that runs Supertone's on-device multilingual TTS (the Supertone/supertonic-3 flow-matching model, 4 ONNX graphs) via ONNX Runtime, plus a ready-to-install gallery model. CPU-only in this PR; CUDA is a deliberate follow-up (see below).

Why a native Go backend

Supertonic needs no G2P / espeak-ng: text preprocessing is NFKD normalization + a Unicode-codepoint -> token-id lookup (unicode_indexer.json). Fully self-contained.
Upstream ships a complete, MIT-licensed Go pipeline (go/helper.go) using github.com/yalue/onnxruntime_go. We vendor it (pinned commit, MIT header preserved) and drive it from a LocalAI gRPC server, mirroring the existing sherpa-onnx backend's ONNX-runtime-tarball bundling. No Python runtime, small image, fast cold start.

What's included

Backend backend/go/supertonic/: main.go (gRPC bootstrap), backend.go (Load/TTS/TTSStream, option parsing, voice/lang resolution, PCM-chunk streaming), vendored helper.go, unit tests + a gated end-to-end synthesis test, Makefile/run.sh/package.sh.
Wiring: root Makefile, CPU build-matrix entries (amd64 + arm64), backend gallery backend/index.yaml meta + image entries, pref-only importer registration (/backends/known).
Gallery model: gallery/supertonic.yaml config template + a supertonic-3 entry in gallery/index.yaml (16 files: 4 ONNX + tts.json + unicode_indexer.json + 10 voice styles, all SHA256-pinned from Supertone/supertonic-3).

Request mapping

voice -> voice_styles/<name>.json (the model keeps voice_styles/ as a sibling of the onnx/ dir; the backend resolves both layouts). Voices: F1-F5, M1-M5.
language -> language tag (default na); validated against the model's supported set.
Model options: supertonic.steps (default 8), supertonic.speed (1.05), supertonic.silence (0.3), supertonic.default_voice (F1), supertonic.default_lang (na).
Streaming: Supertonic has no native incremental API, so TTSStream chunks the finished PCM.

Validation

go build + golangci-lint (new-from-merge-base): clean. Unit tests pass; gallery YAML validated; gated e2e skips without a model.
End-to-end verified locally against Supertone/supertonic-3: synthesized a valid RIFF/WAVE 16-bit mono 44.1 kHz clip (2.75 s, peak amplitude 8682 - real speech) through the full Load -> 4-stage ONNX pipeline -> WAV path. SHA256s cross-checked against HF's x-linked-etag.

Out of scope (follow-ups)

CUDA / OpenVINO / ROCm. Upstream's Go LoadTextToSpeech hard-errors on GPU and passes nil session options; CUDA needs explicit AppendExecutionProviderCUDA wiring + GPU validation. The onnxProvider var and BUILD_TYPE=cublas Makefile branch are scaffolding for that follow-up.

Attribution

backend/go/supertonic/helper.go is vendored from supertone-inc/supertonic (MIT) at a pinned commit, with the copyright header preserved. AI-assisted per .agents/ai-coding-assistants.md; commits carry Assisted-by: trailers.

🤖 Generated with Claude Code

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

…dored helper.go) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

…rim; test voice Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

16 files (4 onnx + tts.json + unicode_indexer.json + 10 voice styles) from HF Supertone/supertonic-3, served via the supertonic backend. Defaults to voice F1; onnx/ + sibling voice_styles/ layout matches the backend's resolveVoicesDir. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Pre-existing on master: the field was added without a registry entry, failing TestAllFieldsHaveRegistryEntries (core/config/meta). Add the entry so it renders properly in the model-config UI. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

helper.go is vendored from supertone-inc/supertonic; its G304/G404/G104 findings are inherent to upstream and the math/rand use is correct for flow-matching noise (crypto/rand would be wrong). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler added 11 commits June 15, 2026 08:56

feat(supertonic): vendor upstream Go TTS pipeline (helper.go)

c6e91e0

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

feat(supertonic): add gRPC backend (Load/TTS/TTSStream, CPU)

1636ce1

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

fix(supertonic): satisfy unused linter (use onnxProvider; exclude ven…

02462a3

…dored helper.go) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

test(supertonic): unit tests for resolvers + gated end-to-end synthesis

5b7aab6

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

style(supertonic): gofmt backend.go comment block

8598edf

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

feat(supertonic): add Makefile, run.sh, package.sh (CPU build)

c8a15a8

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

build(supertonic): wire backend into root Makefile

8991009

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

fix(supertonic): check ort.DestroyEnvironment return (errcheck)

0dafc7f

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

fix(supertonic): resolve voice_styles as sibling of onnx dir; guard t…

cbe616a

…rim; test voice Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

feat(supertonic): add CPU build matrix + gallery index entries

758abcf

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

feat(supertonic): expose as pref-only importable backend

000edf8

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

github-advanced-security AI found potential problems Jun 15, 2026

View reviewed changes

mudler added 3 commits June 15, 2026 13:01

mudler merged commit 2df2876 into master Jun 15, 2026
61 checks passed

mudler deleted the feat/supertonic-tts-backend branch June 15, 2026 14:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(supertonic): add Supertonic ONNX TTS backend (CPU)#10342

feat(supertonic): add Supertonic ONNX TTS backend (CPU)#10342
mudler merged 14 commits into
masterfrom
feat/supertonic-tts-backend

localai-bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

localai-bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why a native Go backend

What's included

Request mapping

Validation

Out of scope (follow-ups)

Attribution

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

localai-bot commented Jun 15, 2026 •

edited

Loading