Skip to content

feat(ai-groq): transcription#649

Open
joksas wants to merge 5 commits into
TanStack:mainfrom
joksas:feat/groq-ai-transcription
Open

feat(ai-groq): transcription#649
joksas wants to merge 5 commits into
TanStack:mainfrom
joksas:feat/groq-ai-transcription

Conversation

@joksas
Copy link
Copy Markdown

@joksas joksas commented May 26, 2026

🎯 Changes

Adds Groq as a transcription provider. Groq's API is mostly OpenAI SDK-compatible, but its transcription endpoint additionally accepts HTTP URLs as input, so this is implemented as a custom integration rather than going through the SDK.

✅ Checklist

  • I have followed the steps in the Contributing guide.
  • I have tested this code locally with pnpm run test:pr.

🚀 Release Impact

  • This change affects published code, and I have generated a changeset.
  • This change is docs/CI/dev-only (no release).

Summary by CodeRabbit

  • New Features

    • Added Groq as a transcription provider with Whisper models.
    • Accepts multiple audio input methods (HTTP URLs and local files).
    • Configurable transcription options: temperature and timestamp granularity.
    • Supports verbose JSON and plain text transcription outputs with timestamps and confidence.
  • Documentation

    • Groq transcription usage, examples, API signatures, and supported models/options added.
  • Tests

    • End-to-end test matrix and test helpers updated to include Groq transcription support.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 26, 2026

📝 Walkthrough

Walkthrough

Adds Groq transcription support: model metadata and provider options, a GroqTranscriptionAdapter that accepts URLs or local audio, factories (explicit/API-key env), package re-exports, a changeset, E2E test wiring, and updated docs.

Changes

Groq Transcription Support

Layer / File(s) Summary
Transcription Data Contracts
packages/ai-groq/src/model-meta.ts, packages/ai-groq/src/audio/transcription-provider-options.ts, packages/ai-groq/src/adapters/transcription.ts (types/config)
Groq transcription models (whisper-large-v3-turbo, whisper-large-v3) and GroqTranscriptionProviderOptions (temperature, timestamp_granularities) are defined, along with response shape interfaces for verbose and non-verbose formats.
Transcription Adapter Implementation
packages/ai-groq/src/adapters/transcription.ts (methods, helpers)
GroqTranscriptionAdapter.transcribe() builds and sends multipart requests to Groq's /audio/transcriptions endpoint, handles response formats (verbose_json/text/JSON), maps segments and words with confidence scores, and includes error handling and request logging. Audio input normalizers convert HTTP URLs, Files, Blobs, ArrayBuffers, base64, and data URLs into the expected format.
Factory Functions and Public API
packages/ai-groq/src/adapters/transcription.ts (factories), packages/ai-groq/src/index.ts, .changeset/feat-groq-ai-transcription.md
createGroqTranscription accepts explicit API key; groqTranscription resolves GROQ_API_KEY from environment. The package index re-exports the adapter class, factories, configuration type, provider options, and transcription models alongside existing chat models. Changeset documents the custom integration due to HTTP URL input support.
E2E Testing Integration
testing/e2e/src/lib/feature-support.ts, testing/e2e/src/lib/media-providers.ts
Groq is added to the transcription provider support matrix. The test helper createTranscriptionAdapter registers a groq factory using createGroqTranscription with model whisper-large-v3-turbo, baseURL from openaiUrl(aimockPort), and standard test headers.
Documentation and Media Docs
docs/adapters/groq.md, docs/media/transcription.md
Groq adapter docs and the global transcription docs are updated to document Whisper transcription support, usage examples showing groqTranscription() and generateTranscription(), supported responseFormat values, and provider-specific modelOptions fields.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Whispers now flow through Groq's swift gate,
Audio and URLs make words translate,
Files or bytes, the adapter will mend,
Segments and timestamps neatly append,
A small hop — new voices join the stack!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding transcription support to the Groq adapter package.
Description check ✅ Passed The description follows the template, covers the changes made, completes all required checklist items, and specifies release impact with a changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai-groq/src/adapters/transcription.ts`:
- Around line 161-170: The adapter currently forwards
TranscriptionOptions.responseFormat (effectiveFormat) including 'srt'/'vtt' to
form.append('response_format', effectiveFormat) and later calls response.json()
for non-verbose_json/text branches; change this to explicitly reject unsupported
formats for Groq: add a check after computing effectiveFormat (the variable
named effectiveFormat in transcription.ts) and if effectiveFormat === 'srt' ||
effectiveFormat === 'vtt' throw a clear Error like "responseFormat 'srt'/'vtt'
not supported by Groq" (or alternatively map them to a supported value such as
'text' or 'json' if you prefer), and ensure form.append('response_format', ...)
and the subsequent response handling (the branches that call response.text() or
response.json(), e.g., the block that currently uses response.json() and
generateId(this.name)) only run for supported formats
('json','text','verbose_json').
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f2e8fc62-7779-4e78-9f21-0e407ea9246f

📥 Commits

Reviewing files that changed from the base of the PR and between ef029a0 and a8ced68.

📒 Files selected for processing (7)
  • .changeset/feat-groq-ai-transcription.md
  • packages/ai-groq/src/adapters/transcription.ts
  • packages/ai-groq/src/audio/transcription-provider-options.ts
  • packages/ai-groq/src/index.ts
  • packages/ai-groq/src/model-meta.ts
  • testing/e2e/src/lib/feature-support.ts
  • testing/e2e/src/lib/media-providers.ts

Comment thread packages/ai-groq/src/adapters/transcription.ts
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/media/transcription.md`:
- Around line 99-102: The comment incorrectly states that segments include
word-level timing; update the documentation comment to clarify that verbose_json
is the default, segment timestamps live on result.segments (segment-level
start/end) and word-level timing is exposed separately on result.words
(top-level), and adjust any example text around result.segments and result.words
to reflect that contract (refer to symbols result.segments, result.words, and
verbose_json).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2f7c9fa6-04a5-47a4-a758-3fe3aee26176

📥 Commits

Reviewing files that changed from the base of the PR and between 362ea79 and b5d8da6.

📒 Files selected for processing (2)
  • docs/adapters/groq.md
  • docs/media/transcription.md

Comment on lines +99 to +102
// verbose_json is the default — segments include word-level timing when requested
for (const segment of result.segments ?? []) {
console.log(`[${segment.start}s → ${segment.end}s] ${segment.text}`)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Correct the word-timestamp shape in the Groq example comment.

The note says segments include word-level timing, but the documented contract exposes word timing via result.words (top-level), while segments are segment-level timestamps.

Suggested doc tweak
-// verbose_json is the default — segments include word-level timing when requested
+// verbose_json is the default — segment timestamps are in `result.segments`
+// and word-level timing (if requested/supported) is in `result.words`
 for (const segment of result.segments ?? []) {
   console.log(`[${segment.start}s → ${segment.end}s] ${segment.text}`)
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/media/transcription.md` around lines 99 - 102, The comment incorrectly
states that segments include word-level timing; update the documentation comment
to clarify that verbose_json is the default, segment timestamps live on
result.segments (segment-level start/end) and word-level timing is exposed
separately on result.words (top-level), and adjust any example text around
result.segments and result.words to reflect that contract (refer to symbols
result.segments, result.words, and verbose_json).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant