fix: bypass broken workflow package for self-hosted transcription and AI generation#1630
fix: bypass broken workflow package for self-hosted transcription and AI generation#1630oaris-dev wants to merge 2 commits intoCapSoftware:mainfrom
Conversation
… AI generation The `workflow` package (4.0.1-beta.42) `[local world]` mode crashes with `TypeError: Cannot perform ArrayBuffer.prototype.slice on a detached ArrayBuffer` on all Node versions (20, 22, 24), breaking transcription and AI generation for every self-hosted Docker deployment. See CapSoftware#1550. This replaces `start(transcribeVideoWorkflow, ...)` and `start(generateAiWorkflow, ...)` with direct async function calls that perform the same operations without workflow/step directives. Changes: - lib/transcribe.ts: Replace workflow dispatch with transcribeVideoDirect() that validates, extracts audio, calls Deepgram, saves VTT, and cleans up - lib/generate-ai.ts: Replace workflow dispatch with generateAiDirect() that fetches transcript, calls AI APIs, and saves metadata - actions/videos/get-status.ts: Set PROCESSING before firing transcription to prevent re-trigger loops, add 3-minute stale PROCESSING timeout The workflow files are preserved so Cap Cloud's distributed execution (via web-cluster/WORKFLOWS_RPC_URL) continues to work unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix 'possibly undefined' errors in IntersectionObserver callbacks and array indexing across four HomePage components: - InstantModeDetail: guard IntersectionObserver entry and TABS indexing - RecordingModePicker: guard IntersectionObserver entry and modes indexing - ScreenshotModeDetail: guard IntersectionObserver entry and AUTO_CONFIGS - StudioModeDetail: guard AUTO_CONFIGS indexing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| const extracted = await extractAudioFromUrl(videoUrl); | ||
| try { | ||
| audioBuffer = await fs.readFile(extracted.filePath); | ||
| } finally { | ||
| await extracted.cleanup(); | ||
| } |
There was a problem hiding this comment.
missing try-finally for cleanup if error occurs after extractAudioFromUrl but before VTT upload
| const extracted = await extractAudioFromUrl(videoUrl); | |
| try { | |
| audioBuffer = await fs.readFile(extracted.filePath); | |
| } finally { | |
| await extracted.cleanup(); | |
| } | |
| const extracted = await extractAudioFromUrl(videoUrl); | |
| try { | |
| audioBuffer = await fs.readFile(extracted.filePath); | |
| } catch (error) { | |
| await extracted.cleanup(); | |
| throw error; | |
| } | |
| await extracted.cleanup(); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/web/lib/transcribe.ts
Line: 198-203
Comment:
missing try-finally for cleanup if error occurs after `extractAudioFromUrl` but before VTT upload
```suggestion
const extracted = await extractAudioFromUrl(videoUrl);
try {
audioBuffer = await fs.readFile(extracted.filePath);
} catch (error) {
await extracted.cleanup();
throw error;
}
await extracted.cleanup();
```
How can I resolve this? If you propose a fix, please make it concise.| await db() | ||
| .update(videos) | ||
| .set({ transcriptionStatus: "PROCESSING" }) | ||
| .where(eq(videos.id, videoId)); |
There was a problem hiding this comment.
race condition: transcribeVideo on line 86 also sets PROCESSING at line 143, creating duplicate DB writes. Since transcribeVideo is called with _isRetry=true, it will skip the PROCESSING check anyway
| await db() | |
| .update(videos) | |
| .set({ transcriptionStatus: "PROCESSING" }) | |
| .where(eq(videos.id, videoId)); | |
| transcribeVideo(videoId, video.ownerId, false, true).catch((error) => { |
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/web/actions/videos/get-status.ts
Line: 81-84
Comment:
race condition: `transcribeVideo` on line 86 also sets PROCESSING at line 143, creating duplicate DB writes. Since `transcribeVideo` is called with `_isRetry=true`, it will skip the PROCESSING check anyway
```suggestion
transcribeVideo(videoId, video.ownerId, false, true).catch((error) => {
```
How can I resolve this? If you propose a fix, please make it concise.| const threeMinutesAgo = new Date(Date.now() - 3 * 60 * 1000); | ||
| if (video.updatedAt < threeMinutesAgo) { |
There was a problem hiding this comment.
timeout based on updatedAt will incorrectly trigger if video record is updated for unrelated reasons (e.g. metadata changes, view count). Consider using a dedicated processingStartedAt timestamp or checking transcriptionStatus update time specifically
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/web/actions/videos/get-status.ts
Line: 104-105
Comment:
timeout based on `updatedAt` will incorrectly trigger if video record is updated for unrelated reasons (e.g. metadata changes, view count). Consider using a dedicated `processingStartedAt` timestamp or checking `transcriptionStatus` update time specifically
How can I resolve this? If you propose a fix, please make it concise.| await db() | ||
| .update(videos) | ||
| .set({ transcriptionStatus: "PROCESSING" }) | ||
| .where(eq(videos.id, videoId as Video.VideoId)); |
There was a problem hiding this comment.
redundant DB write - caller in get-status.ts:81-84 already sets PROCESSING before calling this function with _isRetry=true
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/web/lib/transcribe.ts
Line: 140-143
Comment:
redundant DB write - caller in `get-status.ts:81-84` already sets PROCESSING before calling this function with `_isRetry=true`
How can I resolve this? If you propose a fix, please make it concise.| const hasDatePattern = /\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}/.test( | ||
| video.name || "", | ||
| ); |
There was a problem hiding this comment.
regex could match partial timestamp patterns in user-provided video names. Consider anchoring the regex or adding word boundaries to avoid false positives
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/web/lib/generate-ai.ts
Line: 213-215
Comment:
regex could match partial timestamp patterns in user-provided video names. Consider anchoring the regex or adding word boundaries to avoid false positives
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| aiGenerationEnabled, | ||
| }, | ||
| ]); | ||
| await transcribeVideoDirect(videoId, userId, aiGenerationEnabled); |
There was a problem hiding this comment.
synchronous call blocks server action completion. If transcription takes 30+ seconds, this will hold the connection open. Consider moving the await to background or using a job queue pattern
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/web/lib/transcribe.ts
Line: 114
Comment:
synchronous call blocks server action completion. If transcription takes 30+ seconds, this will hold the connection open. Consider moving the await to background or using a job queue pattern
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| .where(eq(videos.id, videoId)); | ||
|
|
||
| await start(generateAiWorkflow, [{ videoId, userId }]); | ||
| await generateAiDirect(videoId, userId); |
There was a problem hiding this comment.
synchronous call blocks server action completion for entire AI generation (potentially 60+ seconds for multi-chunk processing). Consider using background processing
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/web/lib/generate-ai.ts
Line: 87
Comment:
synchronous call blocks server action completion for entire AI generation (potentially 60+ seconds for multi-chunk processing). Consider using background processing
How can I resolve this? If you propose a fix, please make it concise.| .set({ transcriptionStatus: "PROCESSING" }) | ||
| .where(eq(videos.id, videoId)); | ||
|
|
||
| transcribeVideo(videoId, video.ownerId, false, true).catch((error) => { |
There was a problem hiding this comment.
transcribeVideo(videoId, video.ownerId, false, true) is pretty hard to read/maintain (easy to swap booleans accidentally). Would you be open to at least naming the flags at the callsite?
| transcribeVideo(videoId, video.ownerId, false, true).catch((error) => { | |
| const aiGenerationEnabled = false; | |
| const isRetry = true; | |
| transcribeVideo(videoId, video.ownerId, aiGenerationEnabled, isRetry).catch((error) => { |
|
|
||
| if (video.transcriptionStatus === "PROCESSING") { | ||
| const threeMinutesAgo = new Date(Date.now() - 3 * 60 * 1000); | ||
| if (video.updatedAt < threeMinutesAgo) { |
There was a problem hiding this comment.
Quick sanity check: does updatedAt definitely bump when you update transcriptionStatus? If it doesn’t, this timeout might never trip for a stuck PROCESSING row.
| method: "GET", | ||
| headers: { range: "bytes=0-0" }, | ||
| }); | ||
| if (!headResponse.ok) { |
There was a problem hiding this comment.
Since this is a GET with a body (even if it’s 1 byte), it might be worth cancelling the response body to avoid keeping sockets open longer than needed.
| if (!headResponse.ok) { | |
| headResponse.body?.cancel(); | |
| if (!headResponse.ok) { |
| const aiRes = await fetch("https://api.openai.com/v1/chat/completions", { | ||
| method: "POST", | ||
| headers: { | ||
| "Content-Type": "application/json", | ||
| Authorization: `Bearer ${serverEnv().OPENAI_API_KEY}`, | ||
| }, | ||
| body: JSON.stringify({ | ||
| model: "gpt-4o-mini", | ||
| messages: [{ role: "user", content: prompt }], | ||
| }), | ||
| }); |
There was a problem hiding this comment.
Consider adding a timeout here. If the OpenAI request hangs, aiGenerationStatus will stay PROCESSING indefinitely (and there’s no stale timeout like transcription has).
| const aiRes = await fetch("https://api.openai.com/v1/chat/completions", { | |
| method: "POST", | |
| headers: { | |
| "Content-Type": "application/json", | |
| Authorization: `Bearer ${serverEnv().OPENAI_API_KEY}`, | |
| }, | |
| body: JSON.stringify({ | |
| model: "gpt-4o-mini", | |
| messages: [{ role: "user", content: prompt }], | |
| }), | |
| }); | |
| const aiRes = await fetch("https://api.openai.com/v1/chat/completions", { | |
| method: "POST", | |
| signal: AbortSignal.timeout(60_000), | |
| headers: { | |
| "Content-Type": "application/json", | |
| Authorization: `Bearer ${serverEnv().OPENAI_API_KEY}`, | |
| }, | |
| body: JSON.stringify({ | |
| model: "gpt-4o-mini", | |
| messages: [{ role: "user", content: prompt }], | |
| }), | |
| }); |
There was a problem hiding this comment.
Pull request overview
This pull request addresses critical failures in self-hosted Docker deployments by replacing the broken workflow package (local world mode) with direct function calls for transcription and AI generation. The PR also adds a 3-minute stale processing timeout and fixes TypeScript strict mode errors in HomePage components.
Changes:
- Replace workflow-based async queue with synchronous direct transcription and AI generation functions
- Add timeout detection to prevent videos from being stuck in PROCESSING state indefinitely
- Fix TypeScript strict mode errors in HomePage animation components
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| apps/web/lib/transcribe.ts | Replaces workflow call with direct transcribeVideoDirect function that handles audio extraction, Deepgram transcription, and VTT generation |
| apps/web/lib/generate-ai.ts | Replaces workflow call with direct generateAiDirect function that fetches transcripts and calls AI APIs (Groq with OpenAI fallback) |
| apps/web/actions/videos/get-status.ts | Sets PROCESSING status before firing transcription to prevent re-trigger loops; adds 3-minute timeout for stale PROCESSING states |
| apps/web/components/pages/HomePage/StudioModeDetail.tsx | Adds null check for AUTO_CONFIGS array access and IntersectionObserver entry |
| apps/web/components/pages/HomePage/ScreenshotModeDetail.tsx | Adds null checks for array access and IntersectionObserver entry |
| apps/web/components/pages/HomePage/RecordingModePicker.tsx | Adds null checks for modes array access and IntersectionObserver entry |
| apps/web/components/pages/HomePage/InstantModeDetail.tsx | Adds null checks for IntersectionObserver entry and TABS array access |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| await db() | ||
| .update(videos) | ||
| .set({ transcriptionStatus: "PROCESSING" }) | ||
| .where(eq(videos.id, videoId)); | ||
|
|
||
| transcribeVideo(videoId, video.ownerId, false, true).catch((error) => { | ||
| console.error( | ||
| `[Get Status] Error triggering transcription for video ${videoId}:`, | ||
| `[Get Status] Error starting transcription for video ${videoId}:`, | ||
| error, | ||
| ); | ||
| }); |
There was a problem hiding this comment.
Race condition: Setting PROCESSING status in the database before firing the async transcription creates a window where multiple concurrent calls to getVideoStatus could all pass the check at line 59 (where transcriptionStatus is null), then all set PROCESSING and trigger multiple transcription attempts.
Consider using a database-level constraint (e.g., optimistic locking with a version field, or a compare-and-set operation) to ensure only one transcription is triggered. Alternatively, check the status again after the update to verify this instance "won" the race.
| await db() | ||
| .update(videos) | ||
| .set({ transcriptionStatus: "PROCESSING" }) | ||
| .where(eq(videos.id, videoId as Video.VideoId)); | ||
|
|
There was a problem hiding this comment.
Timeout logic depends on updatedAt, but transcribeVideoDirect sets PROCESSING status again at line 140-143, which will update the updatedAt timestamp and reset the timeout window. This means the 3-minute timeout won't work correctly - it will keep getting extended each time the status update happens.
The timeout should either: (1) use a separate startedAt timestamp that's only set once, or (2) only set PROCESSING in get-status and skip setting it again in transcribeVideoDirect.
| await db() | |
| .update(videos) | |
| .set({ transcriptionStatus: "PROCESSING" }) | |
| .where(eq(videos.id, videoId as Video.VideoId)); |
| await db() | ||
| .update(videos) | ||
| .set({ transcriptionStatus: "PROCESSING" }) | ||
| .where(eq(videos.id, videoId as Video.VideoId)); |
There was a problem hiding this comment.
The _isRetry parameter is passed as true from get-status.ts line 86, which allows retrying even if status is PROCESSING. However, transcribeVideoDirect immediately sets the status back to PROCESSING at line 140-143. This means the retry logic doesn't provide any actual benefit - if the original transcription is genuinely stuck at PROCESSING, the retry will just set it to PROCESSING again and get stuck in the same state.
Consider removing this redundant status update in transcribeVideoDirect, or revising the retry logic to handle PROCESSING states differently.
| if (video.transcriptionStatus === "PROCESSING") { | ||
| const threeMinutesAgo = new Date(Date.now() - 3 * 60 * 1000); | ||
| if (video.updatedAt < threeMinutesAgo) { | ||
| await db() | ||
| .update(videos) | ||
| .set({ transcriptionStatus: "ERROR" }) | ||
| .where(eq(videos.id, videoId)); | ||
|
|
||
| return { | ||
| transcriptionStatus: "ERROR", | ||
| aiGenerationStatus: | ||
| (metadata.aiGenerationStatus as AiGenerationStatus) || null, | ||
| aiTitle: metadata.aiTitle || null, | ||
| summary: metadata.summary || null, | ||
| chapters: metadata.chapters || null, | ||
| error: "Failed to start transcription", | ||
| error: "Transcription timed out", | ||
| }; | ||
| } |
There was a problem hiding this comment.
The timeout check uses video.updatedAt from the database query at the beginning of getVideoStatus, but updatedAt is automatically updated by the database via onUpdateNow() whenever the video row is modified. Between the SELECT and this timeout check, other operations (like status updates in transcribeVideoDirect) can modify the row and update the timestamp. This creates a time-of-check-to-time-of-use (TOCTOU) race condition where the timeout logic may use stale data.
Consider re-querying the video record immediately before this timeout check to get the current updatedAt value, or use a dedicated processingStartedAt timestamp field.
| try { | ||
| console.log( | ||
| `[transcribeVideo] Triggering transcription workflow for video ${videoId}`, | ||
| `[transcribeVideo] Starting direct transcription for video ${videoId}`, | ||
| ); | ||
|
|
||
| await start(transcribeVideoWorkflow, [ | ||
| { | ||
| videoId, | ||
| userId, | ||
| aiGenerationEnabled, | ||
| }, | ||
| ]); | ||
| await transcribeVideoDirect(videoId, userId, aiGenerationEnabled); | ||
|
|
||
| return { | ||
| success: true, | ||
| message: "Transcription workflow started", | ||
| message: "Transcription completed", | ||
| }; | ||
| } catch (error) { | ||
| console.error("[transcribeVideo] Failed to trigger workflow:", error); | ||
| console.error("[transcribeVideo] Transcription failed:", error); | ||
|
|
||
| await db() | ||
| .update(videos) | ||
| .set({ transcriptionStatus: null }) | ||
| .set({ transcriptionStatus: "ERROR" }) | ||
| .where(eq(videos.id, videoId)); | ||
|
|
||
| return { | ||
| success: false, | ||
| message: "Failed to start transcription workflow", | ||
| message: "Transcription failed", | ||
| }; | ||
| } |
There was a problem hiding this comment.
If transcribeVideo throws an exception before or during transcribeVideoDirect (e.g., from validation checks at lines 31-107), the catch block won't execute and the transcriptionStatus in get-status.ts will remain stuck at PROCESSING. The fire-and-forget pattern with .catch() at line 86 in get-status.ts only logs errors but doesn't update the database status.
Consider either: (1) wrapping the entire transcribeVideo call (including validation) in a try-catch that sets ERROR status, or (2) ensuring transcribeVideo always returns a result instead of throwing exceptions for validation failures.
| async function transcribeVideoDirect( | ||
| videoId: string, | ||
| userId: string, | ||
| aiGenerationEnabled: boolean, | ||
| ): Promise<void> { | ||
| await db() | ||
| .update(videos) | ||
| .set({ transcriptionStatus: "PROCESSING" }) | ||
| .where(eq(videos.id, videoId as Video.VideoId)); | ||
|
|
||
| const query = await db() | ||
| .select({ | ||
| bucket: s3Buckets, | ||
| }) | ||
| .from(videos) | ||
| .leftJoin(s3Buckets, eq(videos.bucket, s3Buckets.id)) | ||
| .where(eq(videos.id, videoId as Video.VideoId)); | ||
|
|
||
| const row = query[0]; | ||
| if (!row) { | ||
| throw new Error("Video does not exist"); | ||
| } | ||
|
|
||
| const bucketId = (row.bucket?.id ?? null) as S3Bucket.S3BucketId | null; | ||
|
|
||
| const [s3Bucket] = await S3Buckets.getBucketAccess( | ||
| Option.fromNullable(bucketId), | ||
| ).pipe(runPromise); | ||
|
|
||
| const videoKey = `${userId}/${videoId}/result.mp4`; | ||
| const videoUrl = await s3Bucket.getSignedObjectUrl(videoKey).pipe(runPromise); | ||
|
|
||
| const headResponse = await fetch(videoUrl, { | ||
| method: "GET", | ||
| headers: { range: "bytes=0-0" }, | ||
| }); | ||
| if (!headResponse.ok) { | ||
| throw new Error("Video file not accessible"); | ||
| } | ||
|
|
||
| const useMediaServer = isMediaServerConfigured(); | ||
| let hasAudio: boolean; | ||
| let audioBuffer: Buffer; | ||
|
|
||
| if (useMediaServer) { | ||
| hasAudio = await checkHasAudioTrackViaMediaServer(videoUrl); | ||
| if (!hasAudio) { | ||
| await db() | ||
| .update(videos) | ||
| .set({ transcriptionStatus: "NO_AUDIO" }) | ||
| .where(eq(videos.id, videoId as Video.VideoId)); | ||
| return; | ||
| } | ||
| audioBuffer = await extractAudioViaMediaServer(videoUrl); | ||
| } else { | ||
| hasAudio = await checkHasAudioTrack(videoUrl); | ||
| if (!hasAudio) { | ||
| await db() | ||
| .update(videos) | ||
| .set({ transcriptionStatus: "NO_AUDIO" }) | ||
| .where(eq(videos.id, videoId as Video.VideoId)); | ||
| return; | ||
| } | ||
| const extracted = await extractAudioFromUrl(videoUrl); | ||
| try { | ||
| audioBuffer = await fs.readFile(extracted.filePath); | ||
| } finally { | ||
| await extracted.cleanup(); | ||
| } | ||
| } | ||
|
|
||
| const deepgram = createClient(serverEnv().DEEPGRAM_API_KEY as string); | ||
|
|
||
| const { result, error } = await deepgram.listen.prerecorded.transcribeFile( | ||
| audioBuffer, | ||
| { | ||
| model: "nova-3", | ||
| smart_format: true, | ||
| detect_language: true, | ||
| utterances: true, | ||
| mime_type: "audio/mpeg", | ||
| }, | ||
| ); | ||
|
|
||
| if (error) { | ||
| throw new Error(`Deepgram transcription failed: ${error.message}`); | ||
| } | ||
|
|
||
| const transcription = formatToWebVTT(result as unknown as DeepgramResult); | ||
|
|
||
| await s3Bucket | ||
| .putObject(`${userId}/${videoId}/transcription.vtt`, transcription, { | ||
| contentType: "text/vtt", | ||
| }) | ||
| .pipe(runPromise); | ||
|
|
||
| await db() | ||
| .update(videos) | ||
| .set({ transcriptionStatus: "COMPLETE" }) | ||
| .where(eq(videos.id, videoId as Video.VideoId)); | ||
|
|
||
| if (aiGenerationEnabled) { | ||
| await startAiGeneration(videoId as Video.VideoId, userId); | ||
| } | ||
| } |
There was a problem hiding this comment.
Synchronous call makes the function block until transcription completes, which can take minutes for long videos. The calling code at line 86 in get-status.ts uses fire-and-forget with .catch(), but if the server or process restarts during transcription, the work is lost and the status will remain stuck at PROCESSING until the 3-minute timeout.
For production resilience, consider using a persistent job queue (like BullMQ, pg-boss, or similar) instead of in-memory async execution, especially for long-running operations like video transcription.
| async function generateAiDirect( | ||
| videoId: string, | ||
| userId: string, | ||
| ): Promise<void> { | ||
| const query = await db() | ||
| .select({ video: videos, bucket: s3Buckets }) | ||
| .from(videos) | ||
| .leftJoin(s3Buckets, eq(videos.bucket, s3Buckets.id)) | ||
| .where(eq(videos.id, videoId as Video.VideoId)); | ||
|
|
||
| if (query.length === 0 || !query[0]?.video) { | ||
| throw new Error("Video does not exist"); | ||
| } | ||
|
|
||
| const { video, bucket } = query[0]; | ||
| const metadata = (video.metadata as VideoMetadata) || {}; | ||
| const bucketId = (bucket?.id ?? null) as S3Bucket.S3BucketId | null; | ||
|
|
||
| if (video.transcriptionStatus !== "COMPLETE") { | ||
| throw new Error("Transcription not complete"); | ||
| } | ||
|
|
||
| const vtt = await Effect.gen(function* () { | ||
| const [s3Bucket] = yield* S3Buckets.getBucketAccess( | ||
| Option.fromNullable(bucketId), | ||
| ); | ||
| return yield* s3Bucket.getObject(`${userId}/${videoId}/transcription.vtt`); | ||
| }).pipe(runPromise); | ||
|
|
||
| if (Option.isNone(vtt)) { | ||
| await db() | ||
| .update(videos) | ||
| .set({ metadata: { ...metadata, aiGenerationStatus: "SKIPPED" } }) | ||
| .where(eq(videos.id, videoId as Video.VideoId)); | ||
| return; | ||
| } | ||
|
|
||
| const segments = parseVttWithTimestamps(vtt.value); | ||
| const text = segments | ||
| .map((s) => s.text) | ||
| .join(" ") | ||
| .trim(); | ||
|
|
||
| if (text.length < 10) { | ||
| await db() | ||
| .update(videos) | ||
| .set({ metadata: { ...metadata, aiGenerationStatus: "SKIPPED" } }) | ||
| .where(eq(videos.id, videoId as Video.VideoId)); | ||
| return; | ||
| } | ||
|
|
||
| const transcript: TranscriptData = { segments, text }; | ||
| const groqClient = getGroqClient(); | ||
| const chunks = chunkTranscriptWithTimestamps(transcript.segments); | ||
|
|
||
| let aiResult: AiResult; | ||
| if (chunks.length === 1) { | ||
| aiResult = await generateSingleChunk(transcript.text, groqClient); | ||
| } else { | ||
| aiResult = await generateMultipleChunks(chunks, groqClient); | ||
| } | ||
|
|
||
| const updatedMetadata: VideoMetadata = { | ||
| ...metadata, | ||
| aiTitle: aiResult.title || metadata.aiTitle, | ||
| summary: aiResult.summary || metadata.summary, | ||
| chapters: aiResult.chapters || metadata.chapters, | ||
| aiGenerationStatus: "COMPLETE", | ||
| }; | ||
|
|
||
| await db() | ||
| .update(videos) | ||
| .set({ metadata: updatedMetadata }) | ||
| .where(eq(videos.id, videoId as Video.VideoId)); | ||
|
|
||
| const hasDatePattern = /\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}/.test( | ||
| video.name || "", | ||
| ); | ||
|
|
||
| if ( | ||
| (video.name?.startsWith("Cap Recording -") || hasDatePattern) && | ||
| aiResult.title | ||
| ) { | ||
| await db() | ||
| .update(videos) | ||
| .set({ name: aiResult.title }) | ||
| .where(eq(videos.id, videoId as Video.VideoId)); | ||
| } | ||
| } |
There was a problem hiding this comment.
Similar to transcription, this synchronous call blocks until AI generation completes. If the server restarts during execution, the work is lost and the status remains PROCESSING. The fire-and-forget pattern at line 158 in get-status.ts provides no recovery mechanism.
Consider using a persistent job queue for resilience against server restarts and for better observability of long-running AI operations.
Summary
Fixes self-hosted transcription and AI generation by bypassing the broken
workflowpackage[local world]mode. Resolves #1550.lib/transcribe.ts: Replacestart(transcribeVideoWorkflow, ...)with directtranscribeVideoDirect()that validates, extracts audio, calls Deepgram, saves VTT, and cleans uplib/generate-ai.ts: Replacestart(generateAiWorkflow, ...)with directgenerateAiDirect()that fetches transcript, calls AI APIs (Groq → OpenAI fallback), and saves metadataactions/videos/get-status.ts: Set PROCESSING in DB before firing transcription (prevents re-trigger loops on every 2s poll), add 3-minute stale PROCESSING → ERROR timeoutContext
The
workflowpackage (4.0.1-beta.42)[local world]mode crashes withTypeError: Cannot perform ArrayBuffer.prototype.slice on a detached ArrayBufferon all Node versions (20, 22, 24). This breaks transcription and AI generation for every self-hosted Docker deployment (7+ users confirmed in #1550).The root cause:
start()resolves successfully, but the workflow crashes asynchronously in the background queue. The catch block in the caller never fires, sotranscriptionStatusstays null → the 2-second polling loop re-triggers transcription indefinitely.Cap Cloud uses a distributed workflow runner (
web-clusterviaWORKFLOWS_RPC_URL) which doesn't have this issue. The workflow files are preserved so Cap Cloud continues to work unchanged.We've tested this on a self-hosted staging instance (
ghcr.io/oaris-dev/cap-web:staging) with Deepgram + OpenAI configured — transcription, AI generation, NO_AUDIO detection, and stale timeout all work correctly.Test plan
DEEPGRAM_API_KEYconfiguredGROQ_API_KEYorOPENAI_API_KEYis set🤖 Generated with Claude Code
Greptile Summary
Bypasses broken
workflowpackage by replacing async workflow calls with direct synchronous implementations for transcription and AI generation in self-hosted deployments. The workflow package's[local world]mode crashes with ArrayBuffer detachment errors on all Node versions, breaking all self-hosted Docker instances.Major changes:
transcribe.ts: Direct implementation validates video, extracts audio (via media server or FFmpeg), calls Deepgram API, saves VTT to S3, and updates DB statusgenerate-ai.ts: Direct implementation fetches VTT from S3, chunks transcript, calls Groq (with OpenAI fallback), generates title/summary/chapters, and updates metadataget-status.ts: Sets PROCESSING status in DB before firing transcription (prevents infinite polling loops), adds 3-minute stale timeout for stuck PROCESSING recordsIssues found:
updatedAtwhich could trigger incorrectly if video metadata changesConfidence Score: 3/5
apps/web/lib/transcribe.ts(cleanup leak) andapps/web/actions/videos/get-status.ts(race condition, timeout logic)Important Files Changed
Sequence Diagram
sequenceDiagram participant Client as Client (2s poll) participant GetStatus as get-status.ts participant Transcribe as transcribe.ts participant AI as generate-ai.ts participant Deepgram participant Groq participant S3 participant DB Client->>GetStatus: getVideoStatus(videoId) GetStatus->>DB: Check transcriptionStatus alt Status is null GetStatus->>DB: Set PROCESSING GetStatus-->>Client: Return PROCESSING GetStatus->>Transcribe: transcribeVideo(..., _isRetry=true) Note over GetStatus,Transcribe: Fire-and-forget (catch) end Transcribe->>DB: Set PROCESSING (redundant) Transcribe->>S3: getSignedObjectUrl(video) Transcribe->>Transcribe: Extract audio (FFmpeg/MediaServer) Transcribe->>Deepgram: transcribeFile(audioBuffer) Deepgram-->>Transcribe: DeepgramResult Transcribe->>Transcribe: formatToWebVTT Transcribe->>S3: putObject(transcription.vtt) Transcribe->>DB: Set COMPLETE alt aiGenerationEnabled Transcribe->>AI: startAiGeneration(videoId) AI->>DB: Set aiGenerationStatus=PROCESSING AI->>S3: getObject(transcription.vtt) AI->>AI: parseVTT, chunkTranscript loop for each chunk AI->>Groq: chat.completions.create alt Groq fails AI->>Groq: Fallback to OpenAI end Groq-->>AI: AI summary chunk end AI->>AI: Merge chunks, dedupe chapters AI->>DB: Update metadata (title, summary, chapters) AI->>DB: Set aiGenerationStatus=COMPLETE end Client->>GetStatus: Poll again (2s later) alt PROCESSING > 3 minutes GetStatus->>DB: Set ERROR (timeout) GetStatus-->>Client: Return ERROR else GetStatus-->>Client: Return current status endLast reviewed commit: ce2819d
(2/5) Greptile learns from your feedback when you react with thumbs up/down!