Skip to content

Conversation

@varunj-msft
Copy link
Contributor

@varunj-msft varunj-msft commented Jan 30, 2026

Description

Adds image-to-video and remix modes to OpenAIVideoTarget.py (Sora-2)

New capabilities:

Image-to-video: Include image_path piece to use image as first frame
Remix: Set video_id in prompt_metadata to create variation of existing video
Response includes video_id in metadata for chaining workflows
Bug fix: Fixed MIME type handling for image uploads—now passes (filename, bytes, content_type) tuple instead of raw bytes.

Tests and Documentation

Added 7 new unit tests for image-to-video, remix, metadata, and edge cases; updated 2 existing tests
34 unit tests passing in test_video_target.py

ruff check and ruff format pass
No notebook changes

@varunj-msft varunj-msft changed the title FEAT: Add image-to-video and remix modes to OpenAIVideoTarget FEAT: Sora target: support remix, image-to-video Jan 30, 2026
Comment on lines +165 to +166
text_piece = next(p for p in pieces if p.converted_value_data_type == "text")
image_piece = next((p for p in pieces if p.converted_value_data_type == "image_path"), None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this a function inside the Message class, something like this:

def get_pieces_by_type(
    self,
    *,
    data_type: Optional[str] = None,
    exclude_types: Optional[set[str]] = None
) -> list[MessagePiece]:
    if data_type is not None and exclude_types is not None:
        raise ValueError("Cannot specify both data_type and exclude_types.")

    if data_type is not None:
        return [p for p in self.message_pieces if p.converted_value_data_type == data_type]

    if exclude_types is not None:
        return [p for p in self.message_pieces if p.converted_value_data_type not in exclude_types]

    return list(self.message_pieces)

def get_piece_by_type(
    self,
    *,
    data_type: str,
    required: bool = True
) -> Optional[MessagePiece]:
    pieces = self.get_pieces_by_type(data_type=data_type)
    
    if not pieces:
        if required:
            raise ValueError(f"No message piece with data type '{data_type}' found.")
        return None
    
    return pieces[0]

or if you want it to be very efficient (don't mind a bit of code duplication), you can use the next for get_piece_by_type:

def get_piece_by_type(
    self,
    *,
    data_type: str,
    required: bool = True
) -> Optional[MessagePiece]:
    piece = next(
        (p for p in self.message_pieces if p.converted_value_data_type == data_type),
        None
    )
    if piece is None and required:
        raise ValueError(f"No message piece with data type '{data_type}' found.")
    return piece

def get_pieces_by_type(
    self,
    *,
    data_type: Optional[str] = None,
    exclude_types: Optional[set[str]] = None
) -> list[MessagePiece]:
    if data_type is not None and exclude_types is not None:
        raise ValueError("Cannot specify both data_type and exclude_types.")

    if data_type is not None:
        return [p for p in self.message_pieces if p.converted_value_data_type == data_type]
    
    if exclude_types is not None:
        return [p for p in self.message_pieces if p.converted_value_data_type not in exclude_types]
    
    return list(self.message_pieces)

Then your code will become:

text_piece = message.get_piece_by_type(data_type="text")
image_piece = message.get_piece_by_type(data_type="image_path", required=False)

Comment on lines +205 to +206
size=self._size, # type: ignore[arg-type]
seconds=str(self._n_seconds), # type: ignore[arg-type]
Copy link
Contributor

@bashirpartovi bashirpartovi Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not use #type: ignore. This is usually a code-smell that is being masked. Looking at OpenAI SDK it defines (VideoSeconds):

VideoSize: Literal["720x1280", "1280x720", "1024x1792", "1792x1024"]
VideoSeconds: Literal["4", "8", "12"] # these are strings

This code has 2 issues:

  1. self._size is typed as str, which is too broad
  2. self._n_seconds is an int, but the SDK expects Literal["4", "8", "12"] (string literals)

I think you should follow the established patterns we have, e.g. openai_tts_target.py and openai_chat_audio_config.py.

VideoSize = Literal["720x1280", "1280x720", "1024x1792", "1792x1024"]
VideoSeconds = Literal["4", "8", "12"]


class OpenAIVideoTarget(OpenAITarget):
    
    SUPPORTED_RESOLUTIONS: list[VideoSize] = ["720x1280", "1280x720", "1024x1792", "1792x1024"]
    SUPPORTED_DURATIONS: list[VideoSeconds] = ["4", "8", "12"]

    def __init__(
        self,
        *,
        resolution_dimensions: VideoSize = "1280x720",
        n_seconds: VideoSeconds = "4",  
        **kwargs: Any,
    ) -> None:
        ...
        self._n_seconds: VideoSeconds = n_seconds
        self._size: VideoSize = resolution_dimensions

Then you don't need # type: ignore for your SDK call. Although I understand that this is a breaking API change. If we want it to be non-breaking change, then you'd need to do something like this:

def __init__(
    self,
    *,
    resolution_dimensions: VideoSize = "1280x720",
    n_seconds: int | VideoSeconds = 4,  # you can accept both here
    **kwargs: Any,
) -> None:
    ...
    self._n_seconds: VideoSeconds = str(n_seconds) if isinstance(n_seconds, int) else n_seconds
    self._validate_duration() 

Comment on lines +182 to +199
# IMAGE-TO-VIDEO MODE: Use image as first frame
logger.info("Image-to-video mode: Using image as first frame")
image_path = image_piece.converted_value
image_serializer = data_serializer_factory(
value=image_path, data_type="image_path", category="prompt-memory-entries"
)
image_bytes = await image_serializer.read_data()

# Get MIME type for proper file upload (API requires content-type)
mime_type = DataTypeSerializer.get_mime_type(image_path)
if not mime_type:
# Default to PNG if MIME type cannot be determined
mime_type = "image/png"

# Create file tuple with filename and MIME type for OpenAI SDK
# Format: (filename, content, content_type)
filename = os.path.basename(image_path)
input_file = (filename, image_bytes, mime_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is a bit too long, way more than 20 lines. Can we split this branch and move some of the logic out to specialized functions? e.g.

async def _prepare_image_input_async(self, *, image_path: str) -> tuple[str, bytes, str]:
    image_serializer = data_serializer_factory(
        value=image_path, data_type="image_path", category="prompt-memory-entries"
    )
    image_bytes = await image_serializer.read_data()
    
    mime_type = DataTypeSerializer.get_mime_type(image_path) or "image/png"
    filename = os.path.basename(image_path)
    
    return (filename, image_bytes, mime_type)

Comment on lines +378 to +380
text_pieces = [p for p in pieces if p.converted_value_data_type == "text"]
image_pieces = [p for p in pieces if p.converted_value_data_type == "image_path"]
other_pieces = [p for p in pieces if p.converted_value_data_type not in ("text", "image_path")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the function I suggested above here:

text_pieces = message.get_pieces_by_type(data_type="text")
image_pieces = message.get_pieces_by_type(data_type="image_path")
other_pieces = message.get_pieces_by_type(exclude_types={"text", "image_path"})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants