FEAT: Sora target: support remix, image-to-video #1341

varunj-msft · 2026-01-30T20:27:16Z

Description

Adds image-to-video and remix modes to OpenAIVideoTarget.py (Sora-2)

New capabilities:

Image-to-video: Include image_path piece to use image as first frame
Remix: Set video_id in prompt_metadata to create variation of existing video
Response includes video_id in metadata for chaining workflows
Bug fix: Fixed MIME type handling for image uploads—now passes (filename, bytes, content_type) tuple instead of raw bytes.

Tests and Documentation

Added 7 new unit tests for image-to-video, remix, metadata, and edge cases; updated 2 existing tests
34 unit tests passing in test_video_target.py

ruff check and ruff format pass
No notebook changes

bashirpartovi · 2026-01-30T22:44:06Z

pyrit/prompt_target/openai/openai_video_target.py

+        text_piece = next(p for p in pieces if p.converted_value_data_type == "text")
+        image_piece = next((p for p in pieces if p.converted_value_data_type == "image_path"), None)


You can make this a function inside the Message class, something like this:

def get_pieces_by_type( self, *, data_type: Optional[str] = None, exclude_types: Optional[set[str]] = None ) -> list[MessagePiece]: if data_type is not None and exclude_types is not None: raise ValueError("Cannot specify both data_type and exclude_types.") if data_type is not None: return [p for p in self.message_pieces if p.converted_value_data_type == data_type] if exclude_types is not None: return [p for p in self.message_pieces if p.converted_value_data_type not in exclude_types] return list(self.message_pieces) def get_piece_by_type( self, *, data_type: str, required: bool = True ) -> Optional[MessagePiece]: pieces = self.get_pieces_by_type(data_type=data_type) if not pieces: if required: raise ValueError(f"No message piece with data type '{data_type}' found.") return None return pieces[0]

or if you want it to be very efficient (don't mind a bit of code duplication), you can use the next for get_piece_by_type:

def get_piece_by_type( self, *, data_type: str, required: bool = True ) -> Optional[MessagePiece]: piece = next( (p for p in self.message_pieces if p.converted_value_data_type == data_type), None ) if piece is None and required: raise ValueError(f"No message piece with data type '{data_type}' found.") return piece def get_pieces_by_type( self, *, data_type: Optional[str] = None, exclude_types: Optional[set[str]] = None ) -> list[MessagePiece]: if data_type is not None and exclude_types is not None: raise ValueError("Cannot specify both data_type and exclude_types.") if data_type is not None: return [p for p in self.message_pieces if p.converted_value_data_type == data_type] if exclude_types is not None: return [p for p in self.message_pieces if p.converted_value_data_type not in exclude_types] return list(self.message_pieces)

Then your code will become:

text_piece = message.get_piece_by_type(data_type="text") image_piece = message.get_piece_by_type(data_type="image_path", required=False)

bashirpartovi · 2026-01-30T23:04:43Z

pyrit/prompt_target/openai/openai_video_target.py

+                    size=self._size,  # type: ignore[arg-type]
+                    seconds=str(self._n_seconds),  # type: ignore[arg-type]


Please do not use #type: ignore. This is usually a code-smell that is being masked. Looking at OpenAI SDK it defines (VideoSeconds):

VideoSize: Literal["720x1280", "1280x720", "1024x1792", "1792x1024"] VideoSeconds: Literal["4", "8", "12"] # these are strings

This code has 2 issues:

self._size is typed as str, which is too broad

self._n_seconds is an int, but the SDK expects Literal["4", "8", "12"] (string literals)

I think you should follow the established patterns we have, e.g. openai_tts_target.py and openai_chat_audio_config.py.

VideoSize = Literal["720x1280", "1280x720", "1024x1792", "1792x1024"] VideoSeconds = Literal["4", "8", "12"] class OpenAIVideoTarget(OpenAITarget): SUPPORTED_RESOLUTIONS: list[VideoSize] = ["720x1280", "1280x720", "1024x1792", "1792x1024"] SUPPORTED_DURATIONS: list[VideoSeconds] = ["4", "8", "12"] def __init__( self, *, resolution_dimensions: VideoSize = "1280x720", n_seconds: VideoSeconds = "4", **kwargs: Any, ) -> None: ... self._n_seconds: VideoSeconds = n_seconds self._size: VideoSize = resolution_dimensions

Then you don't need # type: ignore for your SDK call. Although I understand that this is a breaking API change. If we want it to be non-breaking change, then you'd need to do something like this:

def __init__( self, *, resolution_dimensions: VideoSize = "1280x720", n_seconds: int | VideoSeconds = 4, # you can accept both here **kwargs: Any, ) -> None: ... self._n_seconds: VideoSeconds = str(n_seconds) if isinstance(n_seconds, int) else n_seconds self._validate_duration()

bashirpartovi · 2026-01-30T23:07:11Z

pyrit/prompt_target/openai/openai_video_target.py

+            # IMAGE-TO-VIDEO MODE: Use image as first frame
+            logger.info("Image-to-video mode: Using image as first frame")
+            image_path = image_piece.converted_value
+            image_serializer = data_serializer_factory(
+                value=image_path, data_type="image_path", category="prompt-memory-entries"
+            )
+            image_bytes = await image_serializer.read_data()
+
+            # Get MIME type for proper file upload (API requires content-type)
+            mime_type = DataTypeSerializer.get_mime_type(image_path)
+            if not mime_type:
+                # Default to PNG if MIME type cannot be determined
+                mime_type = "image/png"
+
+            # Create file tuple with filename and MIME type for OpenAI SDK
+            # Format: (filename, content, content_type)
+            filename = os.path.basename(image_path)
+            input_file = (filename, image_bytes, mime_type)


This function is a bit too long, way more than 20 lines. Can we split this branch and move some of the logic out to specialized functions? e.g.

async def _prepare_image_input_async(self, *, image_path: str) -> tuple[str, bytes, str]: image_serializer = data_serializer_factory( value=image_path, data_type="image_path", category="prompt-memory-entries" ) image_bytes = await image_serializer.read_data() mime_type = DataTypeSerializer.get_mime_type(image_path) or "image/png" filename = os.path.basename(image_path) return (filename, image_bytes, mime_type)

bashirpartovi · 2026-01-30T23:16:55Z

pyrit/prompt_target/openai/openai_video_target.py

+        text_pieces = [p for p in pieces if p.converted_value_data_type == "text"]
+        image_pieces = [p for p in pieces if p.converted_value_data_type == "image_path"]
+        other_pieces = [p for p in pieces if p.converted_value_data_type not in ("text", "image_path")]


You can use the function I suggested above here:

text_pieces = message.get_pieces_by_type(data_type="text") image_pieces = message.get_pieces_by_type(data_type="image_path") other_pieces = message.get_pieces_by_type(exclude_types={"text", "image_path"})

Sora target: support remix, image-to-video

0de7a0a

varunj-msft changed the title ~~FEAT: Add image-to-video and remix modes to OpenAIVideoTarget~~ FEAT: Sora target: support remix, image-to-video Jan 30, 2026

bashirpartovi reviewed Jan 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Sora target: support remix, image-to-video #1341

FEAT: Sora target: support remix, image-to-video #1341

varunj-msft commented Jan 30, 2026 •

edited

Loading

Uh oh!

bashirpartovi Jan 30, 2026

Uh oh!

bashirpartovi Jan 30, 2026 •

edited

Loading

Uh oh!

bashirpartovi Jan 30, 2026

Uh oh!

bashirpartovi Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		text_piece = next(p for p in pieces if p.converted_value_data_type == "text")
		image_piece = next((p for p in pieces if p.converted_value_data_type == "image_path"), None)

		size=self._size, # type: ignore[arg-type]
		seconds=str(self._n_seconds), # type: ignore[arg-type]

FEAT: Sora target: support remix, image-to-video #1341

Are you sure you want to change the base?

FEAT: Sora target: support remix, image-to-video #1341

Conversation

varunj-msft commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests and Documentation

Uh oh!

bashirpartovi Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

bashirpartovi Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bashirpartovi Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

bashirpartovi Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

varunj-msft commented Jan 30, 2026 •

edited

Loading

bashirpartovi Jan 30, 2026 •

edited

Loading