feat(backend): add SpiritLM backend for text, TTS, and ASR#8589
Open
MkDev11 wants to merge 6 commits intomudler:masterfrom
Open
feat(backend): add SpiritLM backend for text, TTS, and ASR#8589MkDev11 wants to merge 6 commits intomudler:masterfrom
MkDev11 wants to merge 6 commits intomudler:masterfrom
Conversation
Implements LocalAI backend for Meta Spirit LM (interleaved text and speech). - backend/python/spiritlm: gRPC servicer with LoadModel, Predict, PredictStream, TTS, TTSStream, AudioTranscription, Health - Supports spirit-lm-base-7b and spirit-lm-expressive-7b - Options: sample_rate (default 16000) - backend/index.yaml: add spiritlm meta and capabilities Ref: mudler#3966 Signed-off-by: mkdev11 <MkDev11@users.noreply.github.com>
✅ Deploy Preview for localai ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Author
|
@mudler could you please review the PR and let me know your feedback? |
Collaborator
|
Can you add an entry in gallery/index.yaml similar to Qwen-ASR? |
Add spirit-lm-base-7b and spirit-lm-expressive-7b to model gallery, following the same pattern as Qwen-ASR (per PR mudler#8589 review). Signed-off-by: mkdev11 <MkDev11@users.noreply.github.com>
Author
@richiejp Thanks for your feedback. I added SpiritLM to |
Collaborator
|
Thanks, the bottleneck on our end is testing. If you provide e2e tests then we can verify these and get it merged faster. |
Author
@richiejp I added e2e tests, please review the update again and let me know your feedback. |
Author
|
any update for me? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes #3966
Adds a new LocalAI backend for Meta Spirit LM: an interleaved text and speech model that supports text generation, text-to-speech (TTS), and automatic speech recognition (ASR) in a single 7B model.
Changes:
LoadModel: loadsspirit-lm-base-7borspirit-lm-expressive-7bPredict/PredictStream: text generation viaOutputModality.TEXTTTS/TTSStream: text → speech viaOutputModality.SPEECH(float32 → 16 kHz WAV)AudioTranscription: speech → text viaOutputModality.TEXTfrom audio path (request.dst)Health, options parsing (sample_rate, etc.)&spiritlmmeta with description, tags (text-to-text, TTS, ASR, LLM, multimodal), capabilities (cpu-spiritlm, cuda12-spiritlm).Notes for Reviewers
requirements-install.txtinstalls fromgit+https://github.com/facebookresearch/spiritlm.git. Checkpoints must be set up separately per the SpiritLM repo.backend.protointo the backend dir per existing Dockerfile.python.fair-noncommercial(Meta FAIR Noncommercial Research License).Signed commits