feat(ai-groq): transcription by joksas · Pull Request #649 · TanStack/ai

joksas · 2026-05-26T22:39:29Z

🎯 Changes

Adds Groq as a transcription provider. Groq's API is mostly OpenAI SDK-compatible, but its transcription endpoint additionally accepts HTTP URLs as input, so this is implemented as a custom integration rather than going through the SDK.

✅ Checklist

I have followed the steps in the Contributing guide.
I have tested this code locally with pnpm run test:pr.

🚀 Release Impact

This change affects published code, and I have generated a changeset.
This change is docs/CI/dev-only (no release).

Summary by CodeRabbit

New Features
- Added Groq as a transcription provider with Whisper models.
- Accepts multiple audio input methods (HTTP URLs and local files).
- Configurable transcription options: temperature and timestamp granularity.
- Supports verbose JSON and plain text transcription outputs with timestamps and confidence.
Documentation
- Groq transcription usage, examples, API signatures, and supported models/options added.
Tests
- End-to-end test matrix and test helpers updated to include Groq transcription support.

coderabbitai · 2026-05-26T22:39:42Z

📝 Walkthrough

Walkthrough

Adds Groq transcription support: model metadata and provider options, a GroqTranscriptionAdapter that accepts URLs or local audio, factories (explicit/API-key env), package re-exports, a changeset, E2E test wiring, and updated docs.

Changes

Groq Transcription Support

Layer / File(s)	Summary
Transcription Data Contracts `packages/ai-groq/src/model-meta.ts`, `packages/ai-groq/src/audio/transcription-provider-options.ts`, `packages/ai-groq/src/adapters/transcription.ts` (types/config)	Groq transcription models (`whisper-large-v3-turbo`, `whisper-large-v3`) and `GroqTranscriptionProviderOptions` (temperature, timestamp_granularities) are defined, along with response shape interfaces for verbose and non-verbose formats.
Transcription Adapter Implementation `packages/ai-groq/src/adapters/transcription.ts` (methods, helpers)	`GroqTranscriptionAdapter.transcribe()` builds and sends multipart requests to Groq's `/audio/transcriptions` endpoint, handles response formats (verbose_json/text/JSON), maps segments and words with confidence scores, and includes error handling and request logging. Audio input normalizers convert HTTP URLs, Files, Blobs, ArrayBuffers, base64, and data URLs into the expected format.
Factory Functions and Public API `packages/ai-groq/src/adapters/transcription.ts` (factories), `packages/ai-groq/src/index.ts`, `.changeset/feat-groq-ai-transcription.md`	`createGroqTranscription` accepts explicit API key; `groqTranscription` resolves `GROQ_API_KEY` from environment. The package index re-exports the adapter class, factories, configuration type, provider options, and transcription models alongside existing chat models. Changeset documents the custom integration due to HTTP URL input support.
E2E Testing Integration `testing/e2e/src/lib/feature-support.ts`, `testing/e2e/src/lib/media-providers.ts`	Groq is added to the transcription provider support matrix. The test helper `createTranscriptionAdapter` registers a `groq` factory using `createGroqTranscription` with model `whisper-large-v3-turbo`, baseURL from `openaiUrl(aimockPort)`, and standard test headers.
Documentation and Media Docs `docs/adapters/groq.md`, `docs/media/transcription.md`	Groq adapter docs and the global transcription docs are updated to document Whisper transcription support, usage examples showing `groqTranscription()` and `generateTranscription()`, supported `responseFormat` values, and provider-specific `modelOptions` fields.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Whispers now flow through Groq's swift gate,
Audio and URLs make words translate,
Files or bytes, the adapter will mend,
Segments and timestamps neatly append,
A small hop — new voices join the stack!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: adding transcription support to the Groq adapter package.
Description check	✅ Passed	The description follows the template, covers the changes made, completes all required checklist items, and specifies release impact with a changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai-groq/src/adapters/transcription.ts`:
- Around line 161-170: The adapter currently forwards
TranscriptionOptions.responseFormat (effectiveFormat) including 'srt'/'vtt' to
form.append('response_format', effectiveFormat) and later calls response.json()
for non-verbose_json/text branches; change this to explicitly reject unsupported
formats for Groq: add a check after computing effectiveFormat (the variable
named effectiveFormat in transcription.ts) and if effectiveFormat === 'srt' ||
effectiveFormat === 'vtt' throw a clear Error like "responseFormat 'srt'/'vtt'
not supported by Groq" (or alternatively map them to a supported value such as
'text' or 'json' if you prefer), and ensure form.append('response_format', ...)
and the subsequent response handling (the branches that call response.text() or
response.json(), e.g., the block that currently uses response.json() and
generateId(this.name)) only run for supported formats
('json','text','verbose_json').

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f2e8fc62-7779-4e78-9f21-0e407ea9246f

📥 Commits

Reviewing files that changed from the base of the PR and between ef029a0 and a8ced68.

📒 Files selected for processing (7)

.changeset/feat-groq-ai-transcription.md
packages/ai-groq/src/adapters/transcription.ts
packages/ai-groq/src/audio/transcription-provider-options.ts
packages/ai-groq/src/index.ts
packages/ai-groq/src/model-meta.ts
testing/e2e/src/lib/feature-support.ts
testing/e2e/src/lib/media-providers.ts

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/media/transcription.md`:
- Around line 99-102: The comment incorrectly states that segments include
word-level timing; update the documentation comment to clarify that verbose_json
is the default, segment timestamps live on result.segments (segment-level
start/end) and word-level timing is exposed separately on result.words
(top-level), and adjust any example text around result.segments and result.words
to reflect that contract (refer to symbols result.segments, result.words, and
verbose_json).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2f7c9fa6-04a5-47a4-a758-3fe3aee26176

📥 Commits

Reviewing files that changed from the base of the PR and between 362ea79 and b5d8da6.

📒 Files selected for processing (2)

docs/adapters/groq.md
docs/media/transcription.md

coderabbitai · 2026-05-27T21:51:14Z

+// verbose_json is the default — segments include word-level timing when requested
+for (const segment of result.segments ?? []) {
+  console.log(`[${segment.start}s → ${segment.end}s] ${segment.text}`)
+}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Correct the word-timestamp shape in the Groq example comment.

The note says segments include word-level timing, but the documented contract exposes word timing via result.words (top-level), while segments are segment-level timestamps.

Suggested doc tweak

-// verbose_json is the default — segments include word-level timing when requested +// verbose_json is the default — segment timestamps are in `result.segments` +// and word-level timing (if requested/supported) is in `result.words` for (const segment of result.segments ?? []) { console.log(`[${segment.start}s → ${segment.end}s] ${segment.text}`) }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/media/transcription.md` around lines 99 - 102, The comment incorrectly states that segments include word-level timing; update the documentation comment to clarify that verbose_json is the default, segment timestamps live on result.segments (segment-level start/end) and word-level timing is exposed separately on result.words (top-level), and adjust any example text around result.segments and result.words to reflect that contract (refer to symbols result.segments, result.words, and verbose_json).

joksas added 2 commits May 26, 2026 23:23

feat(ai-groq): transcription

b0736ad

Use minor instead of patch bump

a8ced68

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

Comment thread packages/ai-groq/src/adapters/transcription.ts

joksas added 3 commits May 27, 2026 21:57

Reject SRT and VTT

fe8c03e

Simplify useVerbose Logic

362ea79

Update Docs

b5d8da6

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ai-groq): transcription#649

feat(ai-groq): transcription#649
joksas wants to merge 5 commits into
TanStack:mainfrom
joksas:feat/groq-ai-transcription

joksas commented May 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

joksas commented May 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎯 Changes

✅ Checklist

🚀 Release Impact

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

joksas commented May 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 26, 2026 •

edited

Loading