Skip to content

feat(supertonic): add Supertonic ONNX TTS backend (CPU)#10342

Merged
mudler merged 14 commits into
masterfrom
feat/supertonic-tts-backend
Jun 15, 2026
Merged

feat(supertonic): add Supertonic ONNX TTS backend (CPU)#10342
mudler merged 14 commits into
masterfrom
feat/supertonic-tts-backend

Conversation

@localai-bot

@localai-bot localai-bot commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

What

Adds a native Go gRPC TTS backend supertonic that runs Supertone's on-device multilingual TTS (the Supertone/supertonic-3 flow-matching model, 4 ONNX graphs) via ONNX Runtime, plus a ready-to-install gallery model. CPU-only in this PR; CUDA is a deliberate follow-up (see below).

Why a native Go backend

  • Supertonic needs no G2P / espeak-ng: text preprocessing is NFKD normalization + a Unicode-codepoint -> token-id lookup (unicode_indexer.json). Fully self-contained.
  • Upstream ships a complete, MIT-licensed Go pipeline (go/helper.go) using github.com/yalue/onnxruntime_go. We vendor it (pinned commit, MIT header preserved) and drive it from a LocalAI gRPC server, mirroring the existing sherpa-onnx backend's ONNX-runtime-tarball bundling. No Python runtime, small image, fast cold start.

What's included

  • Backend backend/go/supertonic/: main.go (gRPC bootstrap), backend.go (Load/TTS/TTSStream, option parsing, voice/lang resolution, PCM-chunk streaming), vendored helper.go, unit tests + a gated end-to-end synthesis test, Makefile/run.sh/package.sh.
  • Wiring: root Makefile, CPU build-matrix entries (amd64 + arm64), backend gallery backend/index.yaml meta + image entries, pref-only importer registration (/backends/known).
  • Gallery model: gallery/supertonic.yaml config template + a supertonic-3 entry in gallery/index.yaml (16 files: 4 ONNX + tts.json + unicode_indexer.json + 10 voice styles, all SHA256-pinned from Supertone/supertonic-3).

Request mapping

  • voice -> voice_styles/<name>.json (the model keeps voice_styles/ as a sibling of the onnx/ dir; the backend resolves both layouts). Voices: F1-F5, M1-M5.
  • language -> language tag (default na); validated against the model's supported set.
  • Model options: supertonic.steps (default 8), supertonic.speed (1.05), supertonic.silence (0.3), supertonic.default_voice (F1), supertonic.default_lang (na).
  • Streaming: Supertonic has no native incremental API, so TTSStream chunks the finished PCM.

Validation

  • go build + golangci-lint (new-from-merge-base): clean. Unit tests pass; gallery YAML validated; gated e2e skips without a model.
  • End-to-end verified locally against Supertone/supertonic-3: synthesized a valid RIFF/WAVE 16-bit mono 44.1 kHz clip (2.75 s, peak amplitude 8682 - real speech) through the full Load -> 4-stage ONNX pipeline -> WAV path. SHA256s cross-checked against HF's x-linked-etag.

Out of scope (follow-ups)

  • CUDA / OpenVINO / ROCm. Upstream's Go LoadTextToSpeech hard-errors on GPU and passes nil session options; CUDA needs explicit AppendExecutionProviderCUDA wiring + GPU validation. The onnxProvider var and BUILD_TYPE=cublas Makefile branch are scaffolding for that follow-up.

Attribution

backend/go/supertonic/helper.go is vendored from supertone-inc/supertonic (MIT) at a pinned commit, with the copyright header preserved. AI-assisted per .agents/ai-coding-assistants.md; commits carry Assisted-by: trailers.

🤖 Generated with Claude Code

mudler added 11 commits June 15, 2026 08:56
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…dored helper.go)

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…rim; test voice

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Comment thread backend/go/supertonic/helper.go Fixed
Comment thread backend/go/supertonic/helper.go Fixed
Comment thread backend/go/supertonic/helper.go Fixed
Comment thread backend/go/supertonic/helper.go Fixed
Comment thread backend/go/supertonic/helper.go Fixed
Comment thread backend/go/supertonic/helper.go Fixed
Comment thread backend/go/supertonic/helper.go Fixed
Comment thread backend/go/supertonic/helper.go Fixed
Comment thread backend/go/supertonic/helper.go Fixed
Comment thread backend/go/supertonic/helper.go Fixed
mudler added 3 commits June 15, 2026 13:01
16 files (4 onnx + tts.json + unicode_indexer.json + 10 voice styles)
from HF Supertone/supertonic-3, served via the supertonic backend.
Defaults to voice F1; onnx/ + sibling voice_styles/ layout matches the
backend's resolveVoicesDir.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Pre-existing on master: the field was added without a registry entry,
failing TestAllFieldsHaveRegistryEntries (core/config/meta). Add the
entry so it renders properly in the model-config UI.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
helper.go is vendored from supertone-inc/supertonic; its G304/G404/G104
findings are inherent to upstream and the math/rand use is correct for
flow-matching noise (crypto/rand would be wrong).

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler merged commit 2df2876 into master Jun 15, 2026
61 checks passed
@mudler mudler deleted the feat/supertonic-tts-backend branch June 15, 2026 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants