Skip to content

fix: convert Feishu opus files for Whisper API STT#6078

Open
stablegenius49 wants to merge 1 commit intoAstrBotDevs:masterfrom
stablegenius49:pr-factory/issue-5971-lark-opus-stt
Open

fix: convert Feishu opus files for Whisper API STT#6078
stablegenius49 wants to merge 1 commit intoAstrBotDevs:masterfrom
stablegenius49:pr-factory/issue-5971-lark-opus-stt

Conversation

@stablegenius49
Copy link
Contributor

@stablegenius49 stablegenius49 commented Mar 11, 2026

Fixes #5971

Modifications / 改动点

  • convert Feishu/Lark .opus recordings to .wav before sending them to the Whisper API provider

  • keep the existing .amr / .silk / Tencent voice conversion path unchanged

  • add a focused regression test covering the .opus -> .wav conversion + temp-file cleanup path

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

Verification:

PYTHONPATH=. pytest -q tests/test_whisper_api_source.py
.                                                                        [100%]
1 passed in 0.92s

Checklist / 检查清单

  • 😊 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。/ If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
  • 👀 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。/ My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
  • 🤓 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到了 requirements.txtpyproject.toml 文件相应位置。/ I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
  • 😮 我的更改没有引入恶意代码。/ My changes do not introduce malicious code.

Summary by Sourcery

在将飞书/ Lark 的 .opus 录音发送给 Whisper API 进行转录之前,先将其转换为临时的 .wav 文件,并为这一转换流程添加回归测试覆盖。

Bug 修复:

  • 通过在 Whisper API 转录之前将 .opus 输入文件转换为 .wav 来处理 .opus 文件,同时保留现有的 .amr/.silk/腾讯 转换行为。

测试:

  • 添加一个异步回归测试,用于验证 .opus 到 .wav 的转换、临时文件清理,以及对 Whisper API 客户端的正确调用。
Original summary in English

Summary by Sourcery

Convert Feishu/Lark .opus recordings to a temporary .wav file before sending them to the Whisper API for transcription, and add regression coverage for this conversion flow.

Bug Fixes:

  • Handle .opus input files by converting them to .wav before Whisper API transcription while preserving the existing .amr/.silk/Tencent conversion behavior.

Tests:

  • Add an async regression test verifying .opus-to-.wav conversion, temporary file cleanup, and correct Whisper API client invocation.

@auto-assign auto-assign bot requested review from LIghtJUNction and anka-afk March 11, 2026 21:48
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Mar 11, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where .opus audio files from platforms like Feishu/Lark were not being correctly processed by the Whisper API. It introduces a necessary preprocessing step to convert these .opus files into the .wav format, making them compatible with the API. This change is integrated seamlessly, maintaining existing audio conversion logic for other formats, and is thoroughly validated by a new, focused regression test.

Highlights

  • Opus File Conversion: Implemented conversion of Feishu/Lark .opus audio files to .wav format before sending them to the Whisper API for Speech-to-Text.
  • Existing Conversion Preservation: Ensured that the existing conversion paths for .amr, .silk, and Tencent voice formats remain unchanged.
  • Regression Test: Added a dedicated regression test to cover the .opus to .wav conversion process and verify proper temporary file cleanup.
Changelog
  • astrbot/core/provider/sources/whisper_api_source.py
    • Imported convert_audio_to_wav utility.
    • Added a conditional block to detect .opus files, convert them to a temporary .wav file, and update the audio_url to point to the converted file.
    • Modified the audio file type detection logic to prioritize .opus handling.
  • tests/test_whisper_api_source.py
    • Created a new test file.
    • Implemented test_get_text_converts_opus_files_to_wav_before_transcription to verify the correct conversion of .opus files and the subsequent cleanup of temporary .wav files.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了 1 个问题,并留下了一些整体性的反馈:

  • 新增的 .opus 分支在 is_tencent 逻辑之前执行,所以来自腾讯的 .opus 文件将不再走腾讯专用的转换路径;如果 .opus 仍然需要腾讯专用处理,请考虑重构条件(例如先检查 is_tencent,或者把 .opus 合并进现有分支)以保留之前的行为。
  • 回归测试通过在 get_text 调用后立即断言转换得到的 .wav 已不存在,从而与当前清理时机强耦合;如果未来实现延迟清理逻辑,这个测试会变得脆弱,因此可以考虑放宽断言,只验证临时 .wav 被创建并用于转写即可。
给 AI 代理的提示
Please address the comments from this code review:

## Overall Comments
- The new `.opus` branch runs before the `is_tencent` logic, so Tencent-origin `.opus` files will no longer follow the Tencent-specific conversion path; if Tencent handling is still required for `.opus`, consider restructuring the condition (e.g., checking `is_tencent` first or folding `.opus` into the existing branch) to preserve previous behavior.
- The regression test tightly couples to the current cleanup timing by asserting the converted `.wav` no longer exists immediately after `get_text`; if future implementations defer cleanup, this will become brittle, so you might relax the assertion to only verify that a temp `.wav` is created and used for transcription.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/whisper_api_source.py" line_range="82-91" />
<code_context>
-        if audio_url.endswith(".amr") or audio_url.endswith(".silk") or is_tencent:
+        lower_audio_url = audio_url.lower()
+
+        if lower_audio_url.endswith(".opus"):
+            temp_dir = get_astrbot_temp_path()
+            output_path = os.path.join(
+                temp_dir,
+                f"whisper_api_{uuid.uuid4().hex[:8]}.wav",
+            )
+            logger.info("Converting opus file to wav using convert_audio_to_wav...")
+            await convert_audio_to_wav(audio_url, output_path)
+            audio_url = output_path
+        elif lower_audio_url.endswith(".amr") or lower_audio_url.endswith(".silk") or is_tencent:
             file_format = await self._get_audio_format(audio_url)

</code_context>
<issue_to_address>
**question:** Clarify behavior for `.opus` files when `is_tencent` is true to avoid potential edge cases.

Currently, `.opus` files always go through conversion to WAV, so the `elif` (including the `is_tencent` check) never runs for them. If Tencent ever sends `.opus` inputs that should be treated like `.amr`/`.silk`, this behavior could be surprising. Please confirm whether `is_tencent` should affect `.opus` handling (e.g., `if lower_audio_url.endswith(".opus") and not is_tencent:` or similar) so the provider-specific intent is explicit and future edge cases are avoided.
</issue_to_address>

Sourcery 对开源项目免费 —— 如果你喜欢我们的代码审查,欢迎分享 ✨
帮我变得更有用!请对每条评论点 👍 或 👎,我会根据你的反馈改进今后的代码审查。
Original comment in English

Hey - I've found 1 issue, and left some high level feedback:

  • The new .opus branch runs before the is_tencent logic, so Tencent-origin .opus files will no longer follow the Tencent-specific conversion path; if Tencent handling is still required for .opus, consider restructuring the condition (e.g., checking is_tencent first or folding .opus into the existing branch) to preserve previous behavior.
  • The regression test tightly couples to the current cleanup timing by asserting the converted .wav no longer exists immediately after get_text; if future implementations defer cleanup, this will become brittle, so you might relax the assertion to only verify that a temp .wav is created and used for transcription.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new `.opus` branch runs before the `is_tencent` logic, so Tencent-origin `.opus` files will no longer follow the Tencent-specific conversion path; if Tencent handling is still required for `.opus`, consider restructuring the condition (e.g., checking `is_tencent` first or folding `.opus` into the existing branch) to preserve previous behavior.
- The regression test tightly couples to the current cleanup timing by asserting the converted `.wav` no longer exists immediately after `get_text`; if future implementations defer cleanup, this will become brittle, so you might relax the assertion to only verify that a temp `.wav` is created and used for transcription.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/whisper_api_source.py" line_range="82-91" />
<code_context>
-        if audio_url.endswith(".amr") or audio_url.endswith(".silk") or is_tencent:
+        lower_audio_url = audio_url.lower()
+
+        if lower_audio_url.endswith(".opus"):
+            temp_dir = get_astrbot_temp_path()
+            output_path = os.path.join(
+                temp_dir,
+                f"whisper_api_{uuid.uuid4().hex[:8]}.wav",
+            )
+            logger.info("Converting opus file to wav using convert_audio_to_wav...")
+            await convert_audio_to_wav(audio_url, output_path)
+            audio_url = output_path
+        elif lower_audio_url.endswith(".amr") or lower_audio_url.endswith(".silk") or is_tencent:
             file_format = await self._get_audio_format(audio_url)

</code_context>
<issue_to_address>
**question:** Clarify behavior for `.opus` files when `is_tencent` is true to avoid potential edge cases.

Currently, `.opus` files always go through conversion to WAV, so the `elif` (including the `is_tencent` check) never runs for them. If Tencent ever sends `.opus` inputs that should be treated like `.amr`/`.silk`, this behavior could be surprising. Please confirm whether `is_tencent` should affect `.opus` handling (e.g., `if lower_audio_url.endswith(".opus") and not is_tencent:` or similar) so the provider-specific intent is explicit and future edge cases are avoided.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +82 to +91
if lower_audio_url.endswith(".opus"):
temp_dir = get_astrbot_temp_path()
output_path = os.path.join(
temp_dir,
f"whisper_api_{uuid.uuid4().hex[:8]}.wav",
)
logger.info("Converting opus file to wav using convert_audio_to_wav...")
await convert_audio_to_wav(audio_url, output_path)
audio_url = output_path
elif lower_audio_url.endswith(".amr") or lower_audio_url.endswith(".silk") or is_tencent:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: 请澄清在 is_tencent 为 true 时对 .opus 文件的处理行为,以避免潜在的边缘情况。

目前 .opus 文件总是会先被转换为 WAV,因此 elif 分支(包括对 is_tencent 的检查)对它们来说永远不会执行。如果以后腾讯发送的 .opus 输入应当像 .amr/.silk 那样处理,这种行为可能会让人意外。请确认 is_tencent 是否应该影响 .opus 的处理(例如使用 if lower_audio_url.endswith(".opus") and not is_tencent: 之类的条件),以便让面向特定服务商的意图更明确,并避免未来的边缘问题。

Original comment in English

question: Clarify behavior for .opus files when is_tencent is true to avoid potential edge cases.

Currently, .opus files always go through conversion to WAV, so the elif (including the is_tencent check) never runs for them. If Tencent ever sends .opus inputs that should be treated like .amr/.silk, this behavior could be surprising. Please confirm whether is_tencent should affect .opus handling (e.g., if lower_audio_url.endswith(".opus") and not is_tencent: or similar) so the provider-specific intent is explicit and future edge cases are avoided.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly adds support for converting .opus files from Feishu/Lark to .wav for the Whisper API. The changes are logical and include a new, well-written regression test. I've identified a couple of areas for improvement. First, there's some code duplication for creating temporary file paths that could be refactored. More importantly, the new test highlights a pre-existing resource leak in the production code where file handles are not being closed properly, which could lead to issues in a long-running application. I've left detailed comments on these points.

file_arg = create_mock.await_args.kwargs["file"]
assert file_arg[0] == "audio.wav"
assert file_arg[1].name.endswith(".wav")
file_arg[1].close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This explicit close() call reveals a potential resource leak in the production code (whisper_api_source.py, line 117). The file is opened with open() but the handle is not closed, which can lead to file descriptor exhaustion in a long-running application. This should be fixed by using a with statement to ensure the file is automatically closed.

Example fix:

with open(audio_url, "rb") as audio_file:
    result = await self.client.audio.transcriptions.create(
        model=self.model_name,
        file=("audio.wav", audio_file),
    )

Comment on lines +83 to +87
temp_dir = get_astrbot_temp_path()
output_path = os.path.join(
temp_dir,
f"whisper_api_{uuid.uuid4().hex[:8]}.wav",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block for generating a temporary output path is identical to the one on lines 96-100. To improve maintainability, consider refactoring this duplicated logic. You could, for example, move the path generation logic to before the if/elif block, to be executed once if any conversion is needed.

@dosubot dosubot bot added the area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. label Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

1 participant