feat(supertonic): add Supertonic ONNX TTS backend (CPU)#10342
Merged
Conversation
Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…dored helper.go) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…rim; test voice Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
16 files (4 onnx + tts.json + unicode_indexer.json + 10 voice styles) from HF Supertone/supertonic-3, served via the supertonic backend. Defaults to voice F1; onnx/ + sibling voice_styles/ layout matches the backend's resolveVoicesDir. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Pre-existing on master: the field was added without a registry entry, failing TestAllFieldsHaveRegistryEntries (core/config/meta). Add the entry so it renders properly in the model-config UI. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
helper.go is vendored from supertone-inc/supertonic; its G304/G404/G104 findings are inherent to upstream and the math/rand use is correct for flow-matching noise (crypto/rand would be wrong). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a native Go gRPC TTS backend
supertonicthat runs Supertone's on-device multilingual TTS (theSupertone/supertonic-3flow-matching model, 4 ONNX graphs) via ONNX Runtime, plus a ready-to-install gallery model. CPU-only in this PR; CUDA is a deliberate follow-up (see below).Why a native Go backend
unicode_indexer.json). Fully self-contained.go/helper.go) usinggithub.com/yalue/onnxruntime_go. We vendor it (pinned commit, MIT header preserved) and drive it from a LocalAI gRPC server, mirroring the existingsherpa-onnxbackend's ONNX-runtime-tarball bundling. No Python runtime, small image, fast cold start.What's included
backend/go/supertonic/:main.go(gRPC bootstrap),backend.go(Load/TTS/TTSStream, option parsing, voice/lang resolution, PCM-chunk streaming), vendoredhelper.go, unit tests + a gated end-to-end synthesis test,Makefile/run.sh/package.sh.Makefile, CPU build-matrix entries (amd64 + arm64), backend gallerybackend/index.yamlmeta + image entries, pref-only importer registration (/backends/known).gallery/supertonic.yamlconfig template + asupertonic-3entry ingallery/index.yaml(16 files: 4 ONNX +tts.json+unicode_indexer.json+ 10 voice styles, all SHA256-pinned fromSupertone/supertonic-3).Request mapping
voice->voice_styles/<name>.json(the model keepsvoice_styles/as a sibling of theonnx/dir; the backend resolves both layouts). Voices: F1-F5, M1-M5.language-> language tag (defaultna); validated against the model's supported set.supertonic.steps(default 8),supertonic.speed(1.05),supertonic.silence(0.3),supertonic.default_voice(F1),supertonic.default_lang(na).TTSStreamchunks the finished PCM.Validation
go build+golangci-lint(new-from-merge-base): clean. Unit tests pass; gallery YAML validated; gated e2e skips without a model.Supertone/supertonic-3: synthesized a valid RIFF/WAVE 16-bit mono 44.1 kHz clip (2.75 s, peak amplitude 8682 - real speech) through the fullLoad-> 4-stage ONNX pipeline -> WAV path. SHA256s cross-checked against HF'sx-linked-etag.Out of scope (follow-ups)
LoadTextToSpeechhard-errors on GPU and passes nil session options; CUDA needs explicitAppendExecutionProviderCUDAwiring + GPU validation. TheonnxProvidervar andBUILD_TYPE=cublasMakefile branch are scaffolding for that follow-up.Attribution
backend/go/supertonic/helper.gois vendored from supertone-inc/supertonic (MIT) at a pinned commit, with the copyright header preserved. AI-assisted per .agents/ai-coding-assistants.md; commits carryAssisted-by:trailers.🤖 Generated with Claude Code