Conversation
…lity - Introduced a new Dockerfile for building a server with CUDA development tools. - Updated the main server Dockerfile to improve the build process. - Added support for image embeddings, including new request and response structures for handling base64 images. - Enhanced the embedding logic to differentiate between text and image inputs, ensuring proper error handling for mixed input types. - Updated dependencies in Cargo.toml and Cargo.lock to include base64 and image libraries.
…nagement - Updated the Dockerfile to use a base image with CUDA 12.2.0 and streamlined the build stages. - Introduced sccache for caching Rust builds and improved the installation of Rust and cargo-chef. - Enhanced the build process by separating the planner and builder stages, ensuring better organization and efficiency. - Removed unnecessary mock scripts and optimized runtime dependencies for a cleaner image.
feature: Added video embedding support and guide
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
This PR is being reviewed by Cursor Bugbot
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
|
|
||
| Returns: | ||
| A list of EmbedData objects, or None if an adapter is used. | ||
| """ |
There was a problem hiding this comment.
Video function stubs placed inside audio docstring
High Severity
The embed_video_file and embed_video_directory function definitions are inserted between the opening """ of embed_audio_file's docstring (line 270) and its actual docstring content (line 307). The """ on line 270 opens the docstring, and the """ on line 277 (intended as embed_video_file's docstring opener) closes it, leaving the rest as invalid syntax inside embed_audio_file's body. This means the video functions don't exist as actual type stubs, embed_audio_file's docstring is garbled, and IDEs/type checkers will not recognize the new video APIs.
| .is_ok() | ||
| { | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Base64 image detection always returns true for valid base64
High Severity
is_base64_image checks with_guessed_format().is_ok() on a Cursor, but with_guessed_format() returns Ok even when no image format is detected (it only fails on I/O errors, which don't occur with Cursor). This means any valid base64 string ≥100 characters is classified as an image. The check needs to verify .format().is_some() instead. This causes the /v1/embeddings endpoint to misroute long base64-valid text strings to the image embedding path, leading to failures or wrong results.
| EmbeddingResult::MultiVector(_) => { | ||
| // For multi-vector embeddings, return empty (or handle differently) | ||
| vec![] | ||
| } |
There was a problem hiding this comment.
Image endpoint silently returns empty multi-vector embeddings
Medium Severity
The new create_image_embeddings endpoint returns vec![] for MultiVector embeddings, silently producing zero-dimensional embedding vectors with no error. Users of multi-vector vision models (e.g., ColPali) would receive response objects where embedding is an empty array, which is indistinguishable from a successful result but contains no usable data. This is a silent data loss scenario.


Note
Medium Risk
Introduces new multimedia processing paths (external
ffmpegexecution, base64 decoding, temp file IO) and expands the public server API surface, which could impact reliability and resource usage if inputs are malformed or large.Overview
Adds opt-in video embedding support behind a new
videoCargo feature: videos are processed via anffmpeg-based frame sampler (VideoProcessor), embedded in batches with a vision model, and annotated withvideo_path/frame_indexmetadata; this is wired through Rust (embed_video_file,embed_video_directory), Python bindings (VideoEmbedConfig,embed_video_*), and new docs/examples.Extends the Actix server to support base64 image inputs:
/v1/embeddingsnow auto-detects text vs base64 images (rejects mixed), and a new/v1/image_embeddingsendpoint decodes/validates images, writes temp files, and runsembed_image_batch; server deps addbase64+image. Also adds a CUDA server Dockerfile and minor Docker build tweaks, plus documentation updates and navigation for the new video guide.Written by Cursor Bugbot for commit a29c2a8. This will update automatically on new commits. Configure here.