Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .changeset/feat-groq-ai-transcription.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
'@tanstack/ai-groq': minor
---

Adds Groq as a transcription provider. Groq's API is mostly OpenAI SDK-compatible,
but its transcription endpoint additionally accepts HTTP URLs as input, so this
is implemented as a custom integration rather than going through the SDK.
37 changes: 34 additions & 3 deletions docs/adapters/groq.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Groq
id: groq-adapter
order: 6
description: "Use Groq's fast inference API with TanStack AI for low-latency LLM responses β€” Llama and other open-weight models via @tanstack/ai-groq."
description: "Use Groq's fast inference API with TanStack AI for low-latency LLM responses and Whisper transcription β€” Llama and other open-weight models via @tanstack/ai-groq."
keywords:
- tanstack ai
- groq
Expand All @@ -11,9 +11,11 @@ keywords:
- low latency
- adapter
- llm
- whisper
- transcription
---

The Groq adapter provides access to Groq's fast inference API, featuring the world's fastest LLM inference.
The Groq adapter provides access to Groq's fast inference API, featuring the world's fastest LLM inference and Whisper-based audio transcription.

## Installation

Expand Down Expand Up @@ -108,6 +110,32 @@ const stream = chat({
});
```

## Transcription

Groq exposes Whisper-based speech-to-text via `groqTranscription()` and the `generateTranscription()` activity. The `audio` input accepts a `File`, `Blob`, `ArrayBuffer`, base64 string, data URL, or an `https://` URL (forwarded directly to Groq without re-uploading).

```typescript
import { generateTranscription } from "@tanstack/ai";
import { groqTranscription } from "@tanstack/ai-groq";

const result = await generateTranscription({
adapter: groqTranscription("whisper-large-v3-turbo"),
audio: "https://example.com/recording.mp3",
language: "en",
});

console.log(result.text);

// verbose_json (the default) populates language, duration, and timestamped segments
for (const segment of result.segments ?? []) {
console.log(`[${segment.start}s β†’ ${segment.end}s] ${segment.text}`);
}
```

Supported models: `whisper-large-v3-turbo`, `whisper-large-v3`. Supported `responseFormat` values: `json`, `text`, `verbose_json` (default). `srt` and `vtt` are not supported by Groq.

See [Transcription](../media/transcription) for the full API.

## Model Options

Groq supports various provider-specific options:
Expand Down Expand Up @@ -197,11 +225,14 @@ Creates a Groq chat adapter with an explicit API key.

**Returns:** A Groq chat adapter instance.

### `groqTranscription(model, config?)` / `createGroqTranscription(model, apiKey, config?)`

Creates a Groq transcription (speech-to-text) adapter. The short form reads `GROQ_API_KEY` from the environment; the `create*` form takes an explicit API key. Supported models: `whisper-large-v3-turbo`, `whisper-large-v3`.

## Limitations

- **Text-to-Speech**: Groq does not currently expose a TTS adapter. Use OpenAI, Gemini, ElevenLabs, or fal for speech generation.
- **Image Generation**: Groq does not support image generation. Use OpenAI, Gemini, or fal for image generation.
- **Transcription**: Groq does not currently expose a transcription adapter through TanStack AI.

## Next Steps

Expand Down
32 changes: 30 additions & 2 deletions docs/media/transcription.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Transcription
id: transcription
order: 4
description: "Transcribe audio to text with OpenAI Whisper and GPT-4o-transcribe via TanStack AI's generateTranscription() API."
description: "Transcribe audio to text with OpenAI Whisper, GPT-4o-transcribe, Groq Whisper, and fal.ai STT models via TanStack AI's generateTranscription() API."
keywords:
- tanstack ai
- transcription
Expand All @@ -11,18 +11,21 @@ keywords:
- whisper
- generateTranscription
- openai
- groq
- fal
---

# Audio Transcription

TanStack AI provides support for audio transcription (speech-to-text) through dedicated transcription adapters. This guide covers how to convert spoken audio into text using OpenAI's Whisper and GPT-4o transcription models.
TanStack AI provides support for audio transcription (speech-to-text) through dedicated transcription adapters. This guide covers how to convert spoken audio into text using OpenAI's Whisper and GPT-4o transcription models, Groq's hosted Whisper models, and fal.ai STT models.

## Overview

Audio transcription is handled by transcription adapters that follow the same tree-shakeable architecture as other adapters in TanStack AI.

Currently supported:
- **OpenAI**: Whisper-1, GPT-4o-transcribe, GPT-4o-mini-transcribe
- **Groq**: whisper-large-v3-turbo, whisper-large-v3
- **fal.ai**: Whisper, Wizper, speech-to-text turbo, ElevenLabs speech-to-text

## Basic Usage
Expand Down Expand Up @@ -76,6 +79,31 @@ const result = await generateTranscription({
})
```

### Groq Transcription

Groq hosts Whisper large-v3 and large-v3-turbo on its fast inference stack. The `audio` input accepts a `File`, `Blob`, `ArrayBuffer`, base64 string, data URL, or an `https://` URL (which is forwarded to Groq without re-uploading).

```typescript
import { generateTranscription } from '@tanstack/ai'
import { groqTranscription } from '@tanstack/ai-groq'

const result = await generateTranscription({
adapter: groqTranscription('whisper-large-v3-turbo'),
audio: 'https://example.com/recording.mp3',
language: 'en',
})

console.log(result.text)
console.log(result.language)

// verbose_json is the default β€” segments include word-level timing when requested
for (const segment of result.segments ?? []) {
console.log(`[${segment.start}s β†’ ${segment.end}s] ${segment.text}`)
}
Comment on lines +99 to +102
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟑 Minor | ⚑ Quick win

Correct the word-timestamp shape in the Groq example comment.

The note says segments include word-level timing, but the documented contract exposes word timing via result.words (top-level), while segments are segment-level timestamps.

Suggested doc tweak
-// verbose_json is the default β€” segments include word-level timing when requested
+// verbose_json is the default β€” segment timestamps are in `result.segments`
+// and word-level timing (if requested/supported) is in `result.words`
 for (const segment of result.segments ?? []) {
   console.log(`[${segment.start}s β†’ ${segment.end}s] ${segment.text}`)
 }
πŸ€– Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/media/transcription.md` around lines 99 - 102, The comment incorrectly
states that segments include word-level timing; update the documentation comment
to clarify that verbose_json is the default, segment timestamps live on
result.segments (segment-level start/end) and word-level timing is exposed
separately on result.words (top-level), and adjust any example text around
result.segments and result.words to reflect that contract (refer to symbols
result.segments, result.words, and verbose_json).

```

> **Note:** Groq supports `responseFormat` values `json`, `text`, and `verbose_json` (default). `srt` and `vtt` are not supported β€” passing them throws. Provider-specific `modelOptions` are `temperature` and `timestamp_granularities` (`['word']`, `['segment']`, or both).

### fal.ai Transcription

fal.ai offers Whisper, Wizper, and other STT models. The `audio` input accepts a URL, `File`, `Blob`, or `ArrayBuffer` (auto-wrapped in a `Blob`).
Expand Down
Loading