Skip to content

fix: use infer_single for GSVI TTS API#6117

Open
stablegenius49 wants to merge 1 commit intoAstrBotDevs:masterfrom
stablegenius49:pr-factory/issue-5638-gsvi-infer-single
Open

fix: use infer_single for GSVI TTS API#6117
stablegenius49 wants to merge 1 commit intoAstrBotDevs:masterfrom
stablegenius49:pr-factory/issue-5638-gsvi-infer-single

Conversation

@stablegenius49
Copy link
Contributor

@stablegenius49 stablegenius49 commented Mar 12, 2026

Summary

  • replace the GSVI provider's hard-coded /tts request path with remote catalog discovery + infer_single
  • resolve the configured model across supported versions and normalize default English emotion labels like default -> 默认
  • download the returned synthesized audio URL and cover the new discovery/infer flow with focused tests

Testing

  • PYTHONPATH=. pytest -q tests/test_gsvi_tts_source.py

Closes #5638

Summary by Sourcery

更新 GSVI TTS provider,使其使用远程目录发现(remote catalog discovery)和 infer_single 端点,而不是硬编码的 /tts 路径,并为新的流程添加测试。

新功能:

  • 为 GSVI TTS API 提供远程模型和版本发现支持,包括可配置的版本、媒体类型、文本语言、超时时间以及 API Key 认证。

错误修复:

  • 更稳健地处理空的 TTS 输入文本和缺失的角色配置,在必要时回退到旧版 /tts 端点。

增强优化:

  • 在从远程目录解析模型和情绪时,将英文情绪标签规范化为 provider 预期的值。
  • 优化 GSVI TTS 请求的情绪和语言选择逻辑,包括常见语言代码和情绪别名的映射。
  • 改进 GSVI TTS HTTP 请求和音频下载的错误报告与日志记录。

测试:

  • 添加单元测试,覆盖推理配置发现(infer config discovery)、情绪规范化,以及基于 infer_single 的新音频生成与下载流程。
Original summary in English

Summary by Sourcery

Update the GSVI TTS provider to use remote catalog discovery and the infer_single endpoint instead of a hard-coded /tts path, and add tests for the new flow.

New Features:

  • Support remote model and version discovery for the GSVI TTS API, including configurable version, media type, text language, timeout, and API key authentication.

Bug Fixes:

  • Handle empty TTS input text and missing character configuration more robustly, falling back to the legacy /tts endpoint when necessary.

Enhancements:

  • Normalize English emotion labels to the provider's expected values when resolving models and emotions from the remote catalog.
  • Refine emotion and language selection logic for GSVI TTS requests, including mapping common language codes and emotion aliases.
  • Improve error reporting and logging for GSVI TTS HTTP requests and audio downloads.

Tests:

  • Add unit tests covering infer config discovery, emotion normalization, and the new infer_single-based audio generation and download flow.

@auto-assign auto-assign bot requested review from Fridemn and anka-afk March 12, 2026 10:11
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request modernizes the GSVI Text-to-Speech (TTS) provider by transitioning from a basic, direct /tts endpoint call to a more advanced infer_single workflow. This enhancement includes dynamic model and version discovery, intelligent emotion mapping, and improved error handling, making the TTS service more adaptable and resilient. The changes are supported by new tests that validate the updated integration logic.

Highlights

  • GSVI TTS API Integration Update: The GSVI TTS API integration has been significantly refactored to utilize the /infer_single endpoint, replacing the previous hard-coded /tts request path. This change enables dynamic remote catalog discovery for models and versions.
  • Model and Version Discovery: The provider now dynamically discovers supported API versions and models, allowing for more flexible configuration. It attempts to resolve the configured model across available versions.
  • Emotion Label Normalization: English emotion labels (e.g., 'default', 'happy') are now normalized to their Chinese equivalents (e.g., '默认', '开心') to ensure compatibility with the GSVI TTS API.
  • Robust Audio Retrieval: The process for obtaining synthesized audio has been improved. After calling /infer_single, the returned audio URL is now explicitly downloaded, and robust error handling has been added for all API interactions.
  • Comprehensive Testing: New focused tests have been added to cover the entire discovery and inference flow, ensuring the reliability of the updated GSVI TTS provider.
Changelog
  • astrbot/core/provider/sources/gsvi_tts_source.py
    • Refactored get_audio method to use a new infer_single workflow for TTS generation.
    • Added new configuration parameters: version, api_key, timeout, media_type, and text_lang.
    • Implemented dynamic model and version discovery via /version and /models/{version} API endpoints.
    • Introduced _auth_headers for API key management and acgnai.top specific authorization.
    • Added _get_json and _post_json helper methods for standardized API requests and error handling.
    • Developed _resolve_infer_config to determine the optimal model configuration based on available versions and character settings.
    • Included _normalize_emotion and _select_emotion for mapping and selecting appropriate emotion labels, including normalization of English terms to Chinese.
    • Added _resolve_text_lang to standardize text language codes for the API payload.
    • Created _build_infer_payload to construct the detailed request body for the /infer_single endpoint.
    • Implemented _request_infer_audio to send the inference request and extract the audio URL.
    • Added _download_binary to download the synthesized audio from the provided URL.
    • Retained and encapsulated the original /tts logic in _download_legacy_tts as a fallback for cases where character is not configured.
  • tests/test_gsvi_tts_source.py
    • Added a new test file for the gsvi_tts_source module.
    • Included test_resolve_infer_config_discovers_version_and_normalizes_default_emotion to verify version discovery and emotion normalization.
    • Added test_get_audio_uses_infer_single_and_downloads_audio to test the end-to-end infer_single workflow, including payload construction and audio file download.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Mar 12, 2026
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了 1 个问题,并给出了一些整体性的反馈:

  • _auth_headers 中,建议解析 self.api_base 来检查主机名(例如通过 urllib.parse.urlparse),而不是使用对 "acgnai.top" 的子字符串匹配,以避免在查询路径或用户名中出现意外匹配。
  • 旧版的 /tts 路径在 _download_legacy_tts 中重新实现了响应处理;你可以复用 _download_binary(或一个共享的辅助函数)来执行下载,从而在旧版流程和 infer_single 流程之间保持错误处理和行为的一致性。
给 AI 代理的提示词
请根据这次代码评审中的评论进行修改:

## 总体评论
-`_auth_headers` 中,建议解析 `self.api_base` 来检查主机名(例如通过 `urllib.parse.urlparse`),而不是使用对 "acgnai.top" 的子字符串匹配,以避免在查询路径或用户名中出现意外匹配。
- 旧版的 `/tts` 路径在 `_download_legacy_tts` 中重新实现了响应处理;你可以复用 `_download_binary`(或一个共享的辅助函数)来执行下载,从而在旧版流程和 `infer_single` 流程之间保持错误处理和行为的一致性。

## 单条评论

### 评论 1
<location path="astrbot/core/provider/sources/gsvi_tts_source.py" line_range="33" />
<code_context>
-
-    async def get_audio(self, text: str) -> str:
-        temp_dir = get_astrbot_temp_path()
-        path = os.path.join(temp_dir, f"gsvi_tts_{uuid.uuid4()}.wav")
-        params = {"text": text}
-
</code_context>
<issue_to_address>
**suggestion (bug_risk):** 输出文件扩展名被硬编码为 .wav,可能与配置的 media_type 不一致。

现在 `media_type` 已经可以配置,文件名应该从 `self.media_type` 推导出扩展名,或者以其他方式确保扩展名与实际格式匹配。当格式发生变更时,如果扩展名不匹配,可能会导致下游依赖扩展名进行路由或解码的消费者错误处理文件。

建议实现:

```python
import os
import mimetypes

```

```python
    async def get_audio(self, text: str) -> str:
        temp_dir = get_astrbot_temp_path()

        # 基于 media_type 推导文件扩展名,如果未知则回退到 .wav
        extension = mimetypes.guess_extension(getattr(self, "media_type", "") or "") or ".wav"
        if not extension.startswith("."):
            extension = f".{extension}"

        path = os.path.join(temp_dir, f"gsvi_tts_{uuid.uuid4()}{extension}")
        params = {"text": text}

```

1. 确保在该类的其他位置正确设置了 `self.media_type`(例如 `"audio/wav"``"audio/mpeg"` 等),以便 `mimetypes.guess_extension` 能返回正确的扩展名。
2. 如果你的项目已经有集中维护的 media type 到扩展名的映射,为了更严格的控制,你可以用该映射替换 `mimetypes.guess_extension` 的调用。
</issue_to_address>

Sourcery 对开源项目是免费的——如果你觉得我们的评审有帮助,欢迎分享 ✨
帮我变得更有用!请在每条评论上点击 👍 或 👎,我会根据反馈改进后续的评审。
Original comment in English

Hey - I've found 1 issue, and left some high level feedback:

  • In _auth_headers, consider parsing self.api_base to check the hostname (e.g., via urllib.parse.urlparse) instead of using a substring match for "acgnai.top", to avoid accidental matches in query paths or usernames.
  • The legacy /tts path reimplements response handling in _download_legacy_tts; you could reuse _download_binary (or a shared helper) for the download to keep error handling and behavior consistent between legacy and infer_single flows.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `_auth_headers`, consider parsing `self.api_base` to check the hostname (e.g., via `urllib.parse.urlparse`) instead of using a substring match for `"acgnai.top"`, to avoid accidental matches in query paths or usernames.
- The legacy `/tts` path reimplements response handling in `_download_legacy_tts`; you could reuse `_download_binary` (or a shared helper) for the download to keep error handling and behavior consistent between legacy and `infer_single` flows.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/gsvi_tts_source.py" line_range="33" />
<code_context>
-
-    async def get_audio(self, text: str) -> str:
-        temp_dir = get_astrbot_temp_path()
-        path = os.path.join(temp_dir, f"gsvi_tts_{uuid.uuid4()}.wav")
-        params = {"text": text}
-
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Output file extension is hardcoded to .wav and may not match the configured media_type.

Now that `media_type` is configurable, the filename should either derive its extension from `self.media_type` or otherwise ensure the extension matches the actual format. A mismatch can cause downstream consumers that use the extension for routing or decoding to mis-handle the file when the format changes.

Suggested implementation:

```python
import os
import mimetypes

```

```python
    async def get_audio(self, text: str) -> str:
        temp_dir = get_astrbot_temp_path()

        # Derive file extension from media_type, fall back to .wav if unknown
        extension = mimetypes.guess_extension(getattr(self, "media_type", "") or "") or ".wav"
        if not extension.startswith("."):
            extension = f".{extension}"

        path = os.path.join(temp_dir, f"gsvi_tts_{uuid.uuid4()}{extension}")
        params = {"text": text}

```

1. Ensure that `self.media_type` is set appropriately elsewhere in this class (e.g., `"audio/wav"`, `"audio/mpeg"`, etc.) so that `mimetypes.guess_extension` returns a correct extension.
2. If your project already has a centralized mapping from media types to extensions, you may want to replace the `mimetypes.guess_extension` call with that mapping for stricter control.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

self.emotion = provider_config.get("emotion")
self.version = provider_config.get("version")
self.api_key = provider_config.get("api_key", "")
self.timeout = int(provider_config.get("timeout", 20))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): 输出文件扩展名被硬编码为 .wav,可能与配置的 media_type 不一致。

现在 media_type 已经可以配置,文件名应该从 self.media_type 推导出扩展名,或者以其他方式确保扩展名与实际格式匹配。当格式发生变更时,如果扩展名不匹配,可能会导致下游依赖扩展名进行路由或解码的消费者错误处理文件。

建议实现:

import os
import mimetypes
    async def get_audio(self, text: str) -> str:
        temp_dir = get_astrbot_temp_path()

        # 基于 media_type 推导文件扩展名,如果未知则回退到 .wav
        extension = mimetypes.guess_extension(getattr(self, "media_type", "") or "") or ".wav"
        if not extension.startswith("."):
            extension = f".{extension}"

        path = os.path.join(temp_dir, f"gsvi_tts_{uuid.uuid4()}{extension}")
        params = {"text": text}
  1. 确保在该类的其他位置正确设置了 self.media_type(例如 "audio/wav""audio/mpeg" 等),以便 mimetypes.guess_extension 能返回正确的扩展名。
  2. 如果你的项目已经有集中维护的 media type 到扩展名的映射,为了更严格的控制,你可以用该映射替换 mimetypes.guess_extension 的调用。
Original comment in English

suggestion (bug_risk): Output file extension is hardcoded to .wav and may not match the configured media_type.

Now that media_type is configurable, the filename should either derive its extension from self.media_type or otherwise ensure the extension matches the actual format. A mismatch can cause downstream consumers that use the extension for routing or decoding to mis-handle the file when the format changes.

Suggested implementation:

import os
import mimetypes
    async def get_audio(self, text: str) -> str:
        temp_dir = get_astrbot_temp_path()

        # Derive file extension from media_type, fall back to .wav if unknown
        extension = mimetypes.guess_extension(getattr(self, "media_type", "") or "") or ".wav"
        if not extension.startswith("."):
            extension = f".{extension}"

        path = os.path.join(temp_dir, f"gsvi_tts_{uuid.uuid4()}{extension}")
        params = {"text": text}
  1. Ensure that self.media_type is set appropriately elsewhere in this class (e.g., "audio/wav", "audio/mpeg", etc.) so that mimetypes.guess_extension returns a correct extension.
  2. If your project already has a centralized mapping from media types to extensions, you may want to replace the mimetypes.guess_extension call with that mapping for stricter control.

@dosubot dosubot bot added the area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. label Mar 12, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively refactors the GSVI TTS provider to use the modern infer_single endpoint and adds corresponding tests. The implementation is solid. I've identified a minor typo in an API parameter and a small regression in the legacy fallback logic. Addressing these points will make the changes even better.

"batch_size": 1,
"batch_threshold": 0.75,
"split_bucket": True,
"speed_facter": 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There appears to be a typo in the parameter name speed_facter. Based on common API designs and open-source implementations of GSVI, it should likely be speed_factor. This typo could cause the parameter to be ignored or result in an API error.

Suggested change
"speed_facter": 1,
"speed_factor": 1,

Comment on lines +243 to +244
encoded_text = urllib.parse.quote(str(text))
url = f"{self.api_base}/tts?text={encoded_text}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fallback to the legacy /tts endpoint does not include the emotion parameter, even if it's configured. The previous implementation did include it, which makes this a regression in functionality for the fallback path. Using urllib.parse.urlencode can simplify parameter encoding and ensure all relevant parameters are included.

        params = {"text": text}
        if self.emotion:
            params["emotion"] = self.emotion
        url = f"{self.api_base}/tts?{urllib.parse.urlencode(params)}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] GSVI TTS(API) 调用不存在的 /tts 接口

2 participants