Claude/youtube csv downloader by 404UID4 · Pull Request #340 · rishikanthc/Scriberr

404UID4 · 2025-12-30T00:59:50Z

this isn't a proper PR, it's messy but functional. proof of concept/food for thought.

I had to process a few hundred hours of untranscribed videos, this was my solution.

Do not merge this (obviously lol)

- Add CLAUDE.md documentation for development guide - Add CSV batch import feature to README.md - Create CSVBatch and CSVBatchRow models for batch job tracking - Implement csvbatch.Processor for sequential URL processing: - Parse CSV files with YouTube URLs - Download video and extract audio via yt-dlp - Delete video file after audio extraction - Transcribe audio using configured WhisperX settings - Save JSON output as {rowId}-{videoFilename}.json - Add REST API handlers for batch management: - POST /csv-batch/upload - Upload CSV file - POST /csv-batch/:id/start - Start processing - POST /csv-batch/:id/stop - Stop processing - GET /csv-batch/:id/status - Check batch status - GET /csv-batch/:id/rows - Get all row details - GET /csv-batch/:id/output/:row_id - Download output JSON - DELETE /csv-batch/:id - Delete batch - Register routes in API router with authentication

- Create cmd/csvbatch/main.go interactive CLI tool with: - Interactive prompts for missing required arguments - Resume/restore session feature for interrupted batches - Live progress bar with ETA tracking - Verbose logging to file and console with color output - First-time setup wizard (--init) - Session persistence for resume capability - Graceful shutdown with interrupt signal handling - List sessions command to view resumable batches - Update processor to allow resuming from cancelled state - Previously only allowed pending/failed, now includes cancelled CLI Usage: scriberr-csvbatch --csv file.csv # Interactive mode scriberr-csvbatch --resume <batch-id> # Resume session scriberr-csvbatch --list-sessions # List resumable scriberr-csvbatch --init # First-time setup

Models (internal/models/csvbatch.go): - Simplify type names: BatchStatus, RowStatus - Clean field names: RowNum, Title, Filename, AudioPath, OutputPath - Remove unused status type and circular relationships - Add Progress() helper method Processor (internal/csvbatch/processor.go): - Rename NewProcessor to New for cleaner API - Simplify method names: Start, Stop, GetStatus, GetRows, List, Delete - Remove unused transcribeFunc parameter - Clean sequential processing logic - Better error handling with concise error messages Handlers (internal/api/csvbatch_handlers.go): - Update to use new processor API method names - Fix model references to use correct type names - Clean up Swagger documentation CLI: - Remove bloated Go CLI (750+ lines) - Add clean shell script (180 lines) at scripts/csvbatch.sh - Script uses REST API with curl/jq - Features: progress polling, resume, list, graceful shutdown Net reduction: 713 lines of code removed

- Fix Swagger annotation: models.Batch → models.CSVBatch - Remove dead code: unused batch variable in DownloadOutput - Fix parseCSV header detection: properly skip header rows

Processor fixes: - Add error checks on all DB operations (batch fetch, status updates) - Handle JSON marshal errors properly - Add transaction to CreateBatch for atomicity - Fix glob error handling in downloadAudio Handler fixes: - Fix ShouldBindJSON error handling with content length check - Fix GetStatus blank assignments in Start/Stop handlers - Fix header injection vulnerability in Content-Disposition - Rename DownloadAllOutputs to ListOutputs (matches actual behavior) - Update Swagger annotations to match implementation Model fixes: - Add index on Status and CreatedAt columns - Add unique constraint on (BatchID, RowNum) composite key - Add NOT NULL constraint on Name field - Add index on row Status field Shell script fixes: - Quote CSV_FILE variable to handle spaces - Add HTTP status code validation on all requests - Fix infinite loop on unknown status - Add numeric validation for progress values - Fix exit codes in cleanup handler - Add retry logic with max retries for polling

Security fixes: - Add CSV file size validation (10MB max) to prevent DoS attacks - Add path traversal protection in DownloadOutput handler - Add row_id integer validation to prevent invalid database queries Error handling improvements: - Add error logging to all database operations in processor.go - Log errors for row status updates, progress tracking, and batch status changes - Ensure errors are visible for debugging without failing the entire batch Shell script improvements: - Add HTTP status code validation for start batch endpoint - Add curl timeouts (--connect-timeout 10 --max-time 30) to all API calls - Prevent infinite hangs on unresponsive servers

Standalone utility to extract all video metadata from a YouTube channel and export to CSV for batch transcription with Scriberr. - channel-to-csv.py: Python script using yt-dlp for metadata extraction - channel-to-csv.sh: Shell wrapper for project environment integration Exports: url, title, duration, views, upload_date, video_id Output CSV is compatible with the CSV batch processor.

…Cc4j1KMBns

Copilot

Pull request overview

This PR adds CSV batch import functionality for processing multiple YouTube videos sequentially. The implementation includes helper scripts for channel CSV export, a batch processing shell client, Go backend with REST API, database models, and documentation.

Key changes:

Adds CSV batch processing system for YouTube video transcription with sequential processing, progress tracking, and resume capability
Implements Python tool to export YouTube channel videos to CSV format
Provides shell script client for interacting with the batch processing API

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
tools/channel-to-csv.sh	Wrapper script to run Python channel exporter with UV environment
tools/channel-to-csv.py	Python tool to extract YouTube channel videos and export metadata to CSV
scripts/csvbatch.sh	Bash client for CSV batch API with upload, start, stop, and status polling
internal/models/csvbatch.go	Database models for batch jobs and individual row tracking
internal/csvbatch/processor.go	Core business logic for batch processing with download, transcription, and output generation
internal/api/csvbatch_handlers.go	REST API handlers for batch operations (upload, start, stop, status, download)
internal/api/handlers.go	Integration of CSV batch processor into main handler
internal/api/router.go	Route registration for CSV batch endpoints with authentication
internal/database/database.go	Auto-migration setup for new batch tables
README.md	Documentation for CSV batch import feature with usage examples
CLAUDE.md	Developer guide addition for CSV batch processing

Comments suppressed due to low confidence (2)

README.md:197

Spelling error: "sohuld" should be "should".

# Security

README.md:226

The documentation states "Delete the video file (saves disk space)" in step 3 of the processing flow, but the actual implementation in downloadAudio only downloads and extracts audio using yt-dlp's --extract-audio flag, which doesn't actually download a separate video file first. The yt-dlp --extract-audio flag downloads the best audio stream directly without downloading a full video file. This documentation is misleading and should be corrected to accurately reflect what the code does.

      - PUID=${PUID:-1000}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-30T01:05:27Z

+	baseName := strings.TrimSuffix(filepath.Base(audioPath), filepath.Ext(audioPath))
+	jsonPath := filepath.Join(filepath.Dir(audioPath), baseName+".json")
+
+	data, err := os.ReadFile(jsonPath)
+	if err != nil {
+		return "", fmt.Errorf("failed to read transcript: %w", err)
+	}
+	return string(data), nil
+}


After successful transcription, the intermediate JSON file created by WhisperX at jsonPath is not cleaned up. This will accumulate files in the audio directory as batches are processed. Consider removing the intermediate transcript JSON file after reading its contents, similar to how temporary files are cleaned up elsewhere in the codebase.

Copilot · 2025-12-30T01:05:28Z

+		database.DB.Model(&models.CSVBatch{}).Where("id = ?", batchID).Updates(map[string]interface{}{
+			"profile_id": req.ProfileID,
+			"parameters": profile.Parameters,
+		})


The database update at line 156 doesn't check for errors. If the update fails, the batch will start with potentially incorrect or missing parameters, but the user won't be notified. This could lead to transcription with wrong settings. The error should be checked and returned to the client.

Suggested change

database.DB.Model(&models.CSVBatch{}).Where("id = ?", batchID).Updates(map[string]interface{}{

"profile_id": req.ProfileID,

"parameters": profile.Parameters,

})

result := database.DB.Model(&models.CSVBatch{}).Where("id = ?", batchID).Updates(map[string]interface{}{

"profile_id": req.ProfileID,

"parameters": profile.Parameters,

})

if result.Error != nil {

logger.Errorf("failed to update CSVBatch %s with profile %v: %v", batchID, *req.ProfileID, result.Error)

c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to update batch with profile settings"})

return

}

if result.RowsAffected == 0 {

c.JSON(http.StatusNotFound, gin.H{"error": "Batch not found"})

return

}

Copilot · 2025-12-30T01:05:28Z

+	if len(name) > 80 {
+		name = name[:80]


The sanitizeFilename function truncates filenames at 80 characters using byte indexing on a UTF-8 string. If the 80th byte falls in the middle of a multi-byte UTF-8 character, this will produce an invalid UTF-8 string and could cause issues when creating files. Consider using rune-based slicing instead: if len([]rune(name)) > 80 { name = string([]rune(name)[:80]) }.

Suggested change

if len(name) > 80 {

name = name[:80]

if len([]rune(name)) > 80 {

runes := []rune(name)

name = string(runes[:80])

Copilot · 2025-12-30T01:05:28Z

+	p.Stop(batchID)
+
+	if batch.OutputDir != "" {
+		os.RemoveAll(batch.OutputDir)
+	}
+	database.DB.Where("batch_id = ?", batchID).Delete(&models.CSVBatchRow{})


The Delete function ignores the error from Stop at line 163. If the batch is not running, Stop returns an error "batch is not running", but this is ignored and deletion proceeds. While this might be intentional (to allow deletion of non-running batches), it would be clearer to explicitly check if the error is the expected "not running" error, or to use a method that doesn't return an error for non-running batches. Additionally, errors from the database deletes at lines 168-169 are not checked before attempting to delete the batch record, which could leave orphaned rows.

Suggested change

p.Stop(batchID)

if batch.OutputDir != "" {

os.RemoveAll(batch.OutputDir)

}

database.DB.Where("batch_id = ?", batchID).Delete(&models.CSVBatchRow{})

if err := p.Stop(batchID); err != nil {

// Allow deletion to proceed if the batch is simply not running.

if !strings.Contains(err.Error(), "batch is not running") {

return err

}

}

if batch.OutputDir != "" {

os.RemoveAll(batch.OutputDir)

}

if err := database.DB.Where("batch_id = ?", batchID).Delete(&models.CSVBatchRow{}).Error; err != nil {

return err

}

Copilot · 2025-12-30T01:05:28Z

+	if !strings.HasPrefix(outputPath, batchDir) {
+		logger.Error("Path traversal attempt", "output_path", outputPath, "batch_dir", batchDir)


The path traversal protection check at line 328 has a subtle issue on Windows. On Windows, paths are case-insensitive, so an attacker could potentially use different casing to bypass the prefix check. Additionally, the check doesn't account for the path separator. For example, if batchDir is "/path/to/batch" and outputPath is "/path/to/batch2/file", the prefix check would pass. Consider using filepath.Rel to verify the path is within the directory: rel, err := filepath.Rel(batchDir, outputPath); if err != nil || strings.HasPrefix(rel, "..") { /* deny */ }.

Suggested change

if !strings.HasPrefix(outputPath, batchDir) {

logger.Error("Path traversal attempt", "output_path", outputPath, "batch_dir", batchDir)

outputPathAbs, err := filepath.Abs(outputPath)

if err != nil {

logger.Error("Failed to resolve absolute output path", "output_path", outputPath, "error", err)

c.JSON(http.StatusInternalServerError, gin.H{"error": "Internal server error"})

return

}

batchDirAbs, err := filepath.Abs(batchDir)

if err != nil {

logger.Error("Failed to resolve absolute batch directory", "batch_dir", batchDir, "error", err)

c.JSON(http.StatusInternalServerError, gin.H{"error": "Internal server error"})

return

}

rel, err := filepath.Rel(batchDirAbs, outputPathAbs)

if err != nil || rel == ".." || strings.HasPrefix(rel, ".."+string(os.PathSeparator)) {

logger.Error("Path traversal attempt", "output_path", outputPathAbs, "batch_dir", batchDirAbs, "rel", rel, "error", err)

Copilot · 2025-12-30T01:05:29Z

+	}
+	defer outFile.Close()
+
+	if _, err := io.Copy(outFile, file); err != nil {


If io.Copy fails after the file is created, the partial file at filePath is not cleaned up. This could leave orphaned files on the filesystem. Consider adding os.Remove(filePath) in the error path after line 94.

Suggested change

if _, err := io.Copy(outFile, file); err != nil {

if _, err := io.Copy(outFile, file); err != nil {

_ = os.Remove(filePath)

Copilot · 2025-12-30T01:05:29Z

+  log "Resuming batch $BATCH_ID..."
+else
+  # Normal mode - upload CSV
+  [ -z "$CSV_FILE" ] && read -p "Enter CSV file path: " CSV_FILE


The read command at line 118 prompts for user input when CSV_FILE is empty, but this happens after the 'set -euo pipefail' directive. If the user hits Ctrl+D (EOF) instead of providing input, the read command will fail and the script will exit due to 'set -e'. Consider using 'read -r CSV_FILE || { error "No CSV file provided"; exit 1; }' to handle this case explicitly.

Suggested change

[ -z "$CSV_FILE" ] && read -p "Enter CSV file path: " CSV_FILE

[ -z "$CSV_FILE" ] && read -r -p "Enter CSV file path: " CSV_FILE || { error "No CSV file provided"; exit 1; }

Copilot · 2025-12-30T01:05:30Z

+	var batch models.CSVBatch
+	if err := database.DB.First(&batch, "id = ?", batchID).Error; err != nil {
+		p.failBatch(batchID, "failed to fetch batch: "+err.Error())
+		return
+	}


The batch parameters are fetched once at the start of processing (line 199-203) and stored in the batch variable. If the batch parameters are updated via the StartBatch handler after processing has started (lines 156-159 in csvbatch_handlers.go), these changes won't be reflected in the ongoing processing because the batch struct is not re-fetched. This could lead to rows being processed with outdated parameters. Consider either preventing parameter updates while a batch is processing, or re-fetching the batch parameters for each row.

Copilot · 2025-12-30T01:05:30Z

+
+func (p *Processor) downloadAudio(ctx context.Context, url, batchID string, rowNum int) (string, error) {
+	tempDir := filepath.Join(p.config.UploadDir, "csv-batch", batchID, "audio")
+	os.MkdirAll(tempDir, 0755)


Error from os.MkdirAll is silently ignored. If directory creation fails (e.g., due to permissions), the subsequent filepath.Join and command execution will fail with confusing errors. This should check and return the error to provide clearer diagnostics.

Suggested change

os.MkdirAll(tempDir, 0755)

if err := os.MkdirAll(tempDir, 0755); err != nil {

return "", fmt.Errorf("failed to create temp dir %q: %w", tempDir, err)

}

Copilot · 2025-12-30T01:05:30Z

+func isYouTubeURL(url string) bool {
+	patterns := []string{
+		`^https?://(www\.)?youtube\.com/watch\?v=[\w-]+`,
+		`^https?://(www\.)?youtu\.be/[\w-]+`,
+		`^https?://(www\.)?youtube\.com/shorts/[\w-]+`,
+	}
+	for _, p := range patterns {
+		if matched, _ := regexp.MatchString(p, url); matched {
+			return true
+		}
+	}
+	return false
+}


The isYouTubeURL regex patterns don't fully validate YouTube URLs and could match malformed URLs. For example, ^https?://(www\.)?youtube\.com/watch\?v=[\w-]+ would match "http://youtube.com/watch?v=abc" followed by additional query parameters like "&malicious=param", which could potentially be exploited. Consider making the regex patterns more strict by using end-of-string anchors ($) or using a proper URL parser to validate the host and extract the video ID more safely.

404UID4 · 2025-12-30T22:41:55Z

@copilot open a new pull request to apply changes based on the comments in this thread

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot · 2026-01-04T08:11:44Z

+		multiTrackProcessor: processing.NewMultiTrackProcessor(),
+		csvBatchProcessor:   csvProcessor,
 		multiTrackProcessor: multiTrackProcessor,


Duplicate assignment to multiTrackProcessor field. Line 107 creates a new MultiTrackProcessor instance, but line 109 immediately overwrites it with the parameter value. This suggests the line 107 assignment is incorrect and should be removed.

Copilot · 2026-01-04T08:11:45Z

+
+	if _, err := io.Copy(outFile, file); err != nil {
+		c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to save file"})
+		return
+	}


Uploaded file not cleaned up on error. If the os.Create operation succeeds but io.Copy fails at line 93, the partially written file at filePath will remain on disk and not be cleaned up. Consider adding a cleanup mechanism or deferring os.Remove with error checking.

Suggested change

if _, err := io.Copy(outFile, file); err != nil {

c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to save file"})

return

}

copyOK := false

defer func() {

if !copyOK {

if err := os.Remove(filePath); err != nil {

logger.Errorf("failed to remove incomplete upload %s: %v", filePath, err)

}

}

}()

if _, err := io.Copy(outFile, file); err != nil {

c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to save file"})

return

}

copyOK = true

Copilot · 2026-01-04T08:11:45Z

+The CSV batch processor (`internal/csvbatch/`) enables bulk YouTube video transcription:
+- Import CSV with YouTube URLs (one per row)
+- Sequential processing per rowId
+- Downloads video → converts to audio → deletes video → transcribes


Documentation incorrectly describes the processing flow. Line 158 states "Downloads video → converts to audio → deletes video → transcribes", but the actual implementation in processor.go uses yt-dlp's --extract-audio flag which directly extracts audio without downloading the full video file first. The documentation should be corrected to match the actual implementation.

Suggested change

- Downloads video → converts to audio → deletes video → transcribes

- Uses yt-dlp to directly extract audio from each URL → transcribes

Copilot · 2026-01-04T08:11:45Z

+				UpdateColumn("success_rows", database.DB.Raw("success_rows + 1")).Error; err != nil {
+				logger.Error("Failed to update success_rows", "batch", batchID, "error", err)
+			}
+		} else {
+			if err := database.DB.Model(&models.CSVBatch{}).Where("id = ?", batchID).
+				UpdateColumn("failed_rows", database.DB.Raw("failed_rows + 1")).Error; err != nil {


Incorrect use of database.DB.Raw for incrementing counters. Lines 220 and 225 attempt to increment success_rows and failed_rows using UpdateColumn with database.DB.Raw, but this syntax is incorrect. The correct syntax should use gorm.Expr, e.g., UpdateColumn("success_rows", gorm.Expr("success_rows + ?", 1)). The current code will likely fail or not increment the counters correctly.

Suggested change

UpdateColumn("success_rows", database.DB.Raw("success_rows + 1")).Error; err != nil {

logger.Error("Failed to update success_rows", "batch", batchID, "error", err)

}

} else {

if err := database.DB.Model(&models.CSVBatch{}).Where("id = ?", batchID).

UpdateColumn("failed_rows", database.DB.Raw("failed_rows + 1")).Error; err != nil {

UpdateColumn("success_rows", gorm.Expr("success_rows + ?", 1)).Error; err != nil {

logger.Error("Failed to update success_rows", "batch", batchID, "error", err)

}

} else {

if err := database.DB.Model(&models.CSVBatch{}).Where("id = ?", batchID).

UpdateColumn("failed_rows", gorm.Expr("failed_rows + ?", 1)).Error; err != nil {

Copilot · 2026-01-04T08:11:46Z

 		unifiedProcessor:    unifiedProcessor,
 		quickTranscription:  quickTranscription,
+		multiTrackProcessor: processing.NewMultiTrackProcessor(),
+		csvBatchProcessor:   csvProcessor,


Undefined variable csvProcessor. Line 108 references csvProcessor which was defined in the incomplete/duplicate NewHandler function at line 62, but this variable is not in scope for the actual NewHandler function. The csvBatchProcessor should be initialized within this function body.

Copilot · 2026-01-04T08:11:46Z

+		return err
+	}
+
+	p.Stop(batchID)


Potential error ignored when stopping batch. The Stop method on line 163 may return an error if the batch is not running, but this error is ignored. While this might be intentional during cleanup, it could mask issues if the Stop operation fails for other reasons.

Suggested change

p.Stop(batchID)

if err := p.Stop(batchID); err != nil {

logger.Errorf("failed to stop batch %s during deletion: %v", batchID, err)

}

Copilot · 2026-01-04T08:11:46Z

+// NewHandler creates a new handler
+func NewHandler(cfg *config.Config, authService *auth.AuthService, taskQueue *queue.TaskQueue, unifiedProcessor *transcription.UnifiedJobProcessor, quickTranscription *transcription.QuickTranscriptionService) *Handler {
+	// Create CSV batch processor
+	csvProcessor := csvbatch.New(cfg)
+
 	broadcaster         *sse.Broadcaster
 }


Duplicate NewHandler function definition. There are two NewHandler functions defined (lines 60-65 incomplete, and lines 68-112 complete). The first one at line 60 appears to be incomplete code that should be removed.

Copilot · 2026-01-04T08:11:46Z

+1. Download the YouTube video
+2. Extract and convert audio to compatible format
+3. Delete the video file (saves disk space)
+4. Transcribe the audio using configured settings
+5. Save JSON output as `{rowId}-{videoFilename}.json`
+6. Mark row as completed and proceed to next


Documentation incorrectly states that video file is deleted. According to line 349, the documentation claims "Delete the video file (saves disk space)", but examining the processor code (internal/csvbatch/processor.go), the downloadAudio function uses yt-dlp with --extract-audio flag, which extracts audio directly without downloading the full video file. This documentation should be corrected to accurately reflect the implementation.

Suggested change

1. Download the YouTube video

2. Extract and convert audio to compatible format

3. Delete the video file (saves disk space)

4. Transcribe the audio using configured settings

5. Save JSON output as `{rowId}-{videoFilename}.json`

6. Mark row as completed and proceed to next

1. Download and extract audio from the YouTube URL (no full video file is stored)

2. Convert audio to a compatible format

3. Transcribe the audio using configured settings

4. Save JSON output as `{rowId}-{videoFilename}.json`

5. Mark row as completed and proceed to next

claude added 7 commits November 26, 2025 18:03

Fix critical review issues in CSV batch processor

524cc5d

- Fix Swagger annotation: models.Batch → models.CSVBatch - Remove dead code: unused batch variable in DownloadOutput - Fix parseCSV header detection: properly skip header rows

Copilot AI review requested due to automatic review settings December 30, 2025 00:59

Copilot started reviewing on behalf of 404UID4 December 30, 2025 01:00 View session

Merge branch 'main' into claude/youtube-csv-downloader-01FZGTrV6VTJB6…

b38462c

…Cc4j1KMBns

Copilot AI reviewed Dec 30, 2025

View reviewed changes

404UID4 requested a review from Copilot January 4, 2026 08:05

Copilot started reviewing on behalf of 404UID4 January 4, 2026 08:06 View session

Copilot AI reviewed Jan 4, 2026

View reviewed changes

404UID4 closed this by deleting the head repository Feb 2, 2026

-		database.DB.Model(&models.CSVBatch{}).Where("id = ?", batchID).Updates(map[string]interface{}{
-			"profile_id": req.ProfileID,
-			"parameters": profile.Parameters,
-		})
+		result := database.DB.Model(&models.CSVBatch{}).Where("id = ?", batchID).Updates(map[string]interface{}{
+			"profile_id": req.ProfileID,
+			"parameters": profile.Parameters,
+		})
+		if result.Error != nil {
+			logger.Errorf("failed to update CSVBatch %s with profile %v: %v", batchID, *req.ProfileID, result.Error)
+			c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to update batch with profile settings"})
+			return
+		}
+		if result.RowsAffected == 0 {
+			c.JSON(http.StatusNotFound, gin.H{"error": "Batch not found"})
+			return
+		}

-	if len(name) > 80 {
-		name = name[:80]
+	if len([]rune(name)) > 80 {
+		runes := []rune(name)
+		name = string(runes[:80])

		if !strings.HasPrefix(outputPath, batchDir) {
		logger.Error("Path traversal attempt", "output_path", outputPath, "batch_dir", batchDir)

-	if !strings.HasPrefix(outputPath, batchDir) {
-		logger.Error("Path traversal attempt", "output_path", outputPath, "batch_dir", batchDir)
+	outputPathAbs, err := filepath.Abs(outputPath)
+	if err != nil {
+		logger.Error("Failed to resolve absolute output path", "output_path", outputPath, "error", err)
+		c.JSON(http.StatusInternalServerError, gin.H{"error": "Internal server error"})
+		return
+	}
+	batchDirAbs, err := filepath.Abs(batchDir)
+	if err != nil {
+		logger.Error("Failed to resolve absolute batch directory", "batch_dir", batchDir, "error", err)
+		c.JSON(http.StatusInternalServerError, gin.H{"error": "Internal server error"})
+		return
+	}
+	rel, err := filepath.Rel(batchDirAbs, outputPathAbs)
+	if err != nil || rel == ".." || strings.HasPrefix(rel, ".."+string(os.PathSeparator)) {
+		logger.Error("Path traversal attempt", "output_path", outputPathAbs, "batch_dir", batchDirAbs, "rel", rel, "error", err)

	if _, err := io.Copy(outFile, file); err != nil {
	if _, err := io.Copy(outFile, file); err != nil {
	_ = os.Remove(filePath)

	[ -z "$CSV_FILE" ] && read -p "Enter CSV file path: " CSV_FILE
	[ -z "$CSV_FILE" ] && read -r -p "Enter CSV file path: " CSV_FILE \|\| { error "No CSV file provided"; exit 1; }

-	os.MkdirAll(tempDir, 0755)
+	if err := os.MkdirAll(tempDir, 0755); err != nil {
+		return "", fmt.Errorf("failed to create temp dir %q: %w", tempDir, err)
+	}

-	if _, err := io.Copy(outFile, file); err != nil {
-		c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to save file"})
-		return
-	}
+	copyOK := false
+	defer func() {
+		if !copyOK {
+			if err := os.Remove(filePath); err != nil {
+				logger.Errorf("failed to remove incomplete upload %s: %v", filePath, err)
+			}
+		}
+	}()
+	if _, err := io.Copy(outFile, file); err != nil {
+		c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to save file"})
+		return
+	}
+	copyOK = true

	- Downloads video → converts to audio → deletes video → transcribes
	- Uses yt-dlp to directly extract audio from each URL → transcribes

-	p.Stop(batchID)
+	if err := p.Stop(batchID); err != nil {
+		logger.Errorf("failed to stop batch %s during deletion: %v", batchID, err)
+	}

Uh oh!

Conversation

404UID4 commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

404UID4 commented Dec 30, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

404UID4 commented Dec 30, 2025 •

edited

Loading