Skip to content

Align FastAPI with official Orpheus prompt setup#3

Open
SebastianBodza wants to merge 1 commit into
mainfrom
official-setup
Open

Align FastAPI with official Orpheus prompt setup#3
SebastianBodza wants to merge 1 commit into
mainfrom
official-setup

Conversation

@SebastianBodza
Copy link
Copy Markdown
Owner

@SebastianBodza SebastianBodza commented Mar 30, 2026

This pull request refactors how special tokens are handled in the streaming_api_server.py for prompt formatting and audio stream generation, aligning the code more closely with the official Orpheus-TTS inference protocol. The changes remove the need for searching for a code start token in the output, instead relying on new, clearly defined token ID lists for prompts and stop conditions. This simplifies the logic and improves maintainability.

Special token handling improvements:

  • Replaced the single CODE_START_TOKEN_ID with PROMPT_START_TOKEN_ID and a list of PROMPT_END_TOKEN_IDS to wrap prompts, and introduced STOP_TOKEN_IDS for stopping generation, matching official Orpheus-TTS inference conventions.
  • Updated the format_prompt_for_vllm_sync function to use the new prompt start and end token IDs, improving clarity and flexibility.

Audio stream generation logic simplification:

  • Removed the logic for searching for a code start token in generated token IDs; now, all relevant tokens are processed directly from the output using the new token ID lists. [1] [2] [3]

Other cleanups:

  • Removed the unused STOP_SEQUENCE constant.

Summary by cubic

Aligns the FastAPI server with the official Orpheus‑TTS inference protocol by standardizing prompt wrapping and stop conditions, and simplifying audio streaming token handling. This removes brittle start‑token scanning and makes the code easier to maintain.

  • Refactors
    • Replaced CODE_START_TOKEN_ID with PROMPT_START_TOKEN_ID and PROMPT_END_TOKEN_IDS; introduced STOP_TOKEN_IDS and passed them to vLLM via extra_body. Removed unused STOP_SEQUENCE.
    • Updated format_prompt_for_vllm_sync to wrap prompts with the new start/end token IDs.
    • Simplified audio stream generation: removed start-token search, filtered tokens using CODE_REMOVE_TOKEN_ID and CODE_TOKEN_OFFSET, kept chunking behavior for initial/stream sizes, and ensured remaining codes are processed at stream end.

Written for commit ab8a5be. Summary will update on new commits.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant