feat(ai-groq): transcription#649
Conversation
📝 WalkthroughWalkthroughAdds Groq transcription support: model metadata and provider options, a GroqTranscriptionAdapter that accepts URLs or local audio, factories (explicit/API-key env), package re-exports, a changeset, E2E test wiring, and updated docs. ChangesGroq Transcription Support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/ai-groq/src/adapters/transcription.ts`:
- Around line 161-170: The adapter currently forwards
TranscriptionOptions.responseFormat (effectiveFormat) including 'srt'/'vtt' to
form.append('response_format', effectiveFormat) and later calls response.json()
for non-verbose_json/text branches; change this to explicitly reject unsupported
formats for Groq: add a check after computing effectiveFormat (the variable
named effectiveFormat in transcription.ts) and if effectiveFormat === 'srt' ||
effectiveFormat === 'vtt' throw a clear Error like "responseFormat 'srt'/'vtt'
not supported by Groq" (or alternatively map them to a supported value such as
'text' or 'json' if you prefer), and ensure form.append('response_format', ...)
and the subsequent response handling (the branches that call response.text() or
response.json(), e.g., the block that currently uses response.json() and
generateId(this.name)) only run for supported formats
('json','text','verbose_json').
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f2e8fc62-7779-4e78-9f21-0e407ea9246f
📒 Files selected for processing (7)
.changeset/feat-groq-ai-transcription.mdpackages/ai-groq/src/adapters/transcription.tspackages/ai-groq/src/audio/transcription-provider-options.tspackages/ai-groq/src/index.tspackages/ai-groq/src/model-meta.tstesting/e2e/src/lib/feature-support.tstesting/e2e/src/lib/media-providers.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/media/transcription.md`:
- Around line 99-102: The comment incorrectly states that segments include
word-level timing; update the documentation comment to clarify that verbose_json
is the default, segment timestamps live on result.segments (segment-level
start/end) and word-level timing is exposed separately on result.words
(top-level), and adjust any example text around result.segments and result.words
to reflect that contract (refer to symbols result.segments, result.words, and
verbose_json).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2f7c9fa6-04a5-47a4-a758-3fe3aee26176
📒 Files selected for processing (2)
docs/adapters/groq.mddocs/media/transcription.md
| // verbose_json is the default — segments include word-level timing when requested | ||
| for (const segment of result.segments ?? []) { | ||
| console.log(`[${segment.start}s → ${segment.end}s] ${segment.text}`) | ||
| } |
There was a problem hiding this comment.
Correct the word-timestamp shape in the Groq example comment.
The note says segments include word-level timing, but the documented contract exposes word timing via result.words (top-level), while segments are segment-level timestamps.
Suggested doc tweak
-// verbose_json is the default — segments include word-level timing when requested
+// verbose_json is the default — segment timestamps are in `result.segments`
+// and word-level timing (if requested/supported) is in `result.words`
for (const segment of result.segments ?? []) {
console.log(`[${segment.start}s → ${segment.end}s] ${segment.text}`)
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/media/transcription.md` around lines 99 - 102, The comment incorrectly
states that segments include word-level timing; update the documentation comment
to clarify that verbose_json is the default, segment timestamps live on
result.segments (segment-level start/end) and word-level timing is exposed
separately on result.words (top-level), and adjust any example text around
result.segments and result.words to reflect that contract (refer to symbols
result.segments, result.words, and verbose_json).
🎯 Changes
Adds Groq as a transcription provider. Groq's API is mostly OpenAI SDK-compatible, but its transcription endpoint additionally accepts HTTP URLs as input, so this is implemented as a custom integration rather than going through the SDK.
✅ Checklist
pnpm run test:pr.🚀 Release Impact
Summary by CodeRabbit
New Features
Documentation
Tests