-
Notifications
You must be signed in to change notification settings - Fork 653
FEAT: Sora target: support remix, image-to-video #1341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| text_piece = next(p for p in pieces if p.converted_value_data_type == "text") | ||
| image_piece = next((p for p in pieces if p.converted_value_data_type == "image_path"), None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make this a function inside the Message class, something like this:
def get_pieces_by_type(
self,
*,
data_type: Optional[str] = None,
exclude_types: Optional[set[str]] = None
) -> list[MessagePiece]:
if data_type is not None and exclude_types is not None:
raise ValueError("Cannot specify both data_type and exclude_types.")
if data_type is not None:
return [p for p in self.message_pieces if p.converted_value_data_type == data_type]
if exclude_types is not None:
return [p for p in self.message_pieces if p.converted_value_data_type not in exclude_types]
return list(self.message_pieces)
def get_piece_by_type(
self,
*,
data_type: str,
required: bool = True
) -> Optional[MessagePiece]:
pieces = self.get_pieces_by_type(data_type=data_type)
if not pieces:
if required:
raise ValueError(f"No message piece with data type '{data_type}' found.")
return None
return pieces[0]or if you want it to be very efficient (don't mind a bit of code duplication), you can use the next for get_piece_by_type:
def get_piece_by_type(
self,
*,
data_type: str,
required: bool = True
) -> Optional[MessagePiece]:
piece = next(
(p for p in self.message_pieces if p.converted_value_data_type == data_type),
None
)
if piece is None and required:
raise ValueError(f"No message piece with data type '{data_type}' found.")
return piece
def get_pieces_by_type(
self,
*,
data_type: Optional[str] = None,
exclude_types: Optional[set[str]] = None
) -> list[MessagePiece]:
if data_type is not None and exclude_types is not None:
raise ValueError("Cannot specify both data_type and exclude_types.")
if data_type is not None:
return [p for p in self.message_pieces if p.converted_value_data_type == data_type]
if exclude_types is not None:
return [p for p in self.message_pieces if p.converted_value_data_type not in exclude_types]
return list(self.message_pieces)Then your code will become:
text_piece = message.get_piece_by_type(data_type="text")
image_piece = message.get_piece_by_type(data_type="image_path", required=False)| size=self._size, # type: ignore[arg-type] | ||
| seconds=str(self._n_seconds), # type: ignore[arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not use #type: ignore. This is usually a code-smell that is being masked. Looking at OpenAI SDK it defines (VideoSeconds):
VideoSize: Literal["720x1280", "1280x720", "1024x1792", "1792x1024"]
VideoSeconds: Literal["4", "8", "12"] # these are stringsThis code has 2 issues:
self._sizeis typed asstr, which is too broadself._n_secondsis anint, but the SDK expectsLiteral["4", "8", "12"](string literals)
I think you should follow the established patterns we have, e.g. openai_tts_target.py and openai_chat_audio_config.py.
VideoSize = Literal["720x1280", "1280x720", "1024x1792", "1792x1024"]
VideoSeconds = Literal["4", "8", "12"]
class OpenAIVideoTarget(OpenAITarget):
SUPPORTED_RESOLUTIONS: list[VideoSize] = ["720x1280", "1280x720", "1024x1792", "1792x1024"]
SUPPORTED_DURATIONS: list[VideoSeconds] = ["4", "8", "12"]
def __init__(
self,
*,
resolution_dimensions: VideoSize = "1280x720",
n_seconds: VideoSeconds = "4",
**kwargs: Any,
) -> None:
...
self._n_seconds: VideoSeconds = n_seconds
self._size: VideoSize = resolution_dimensionsThen you don't need # type: ignore for your SDK call. Although I understand that this is a breaking API change. If we want it to be non-breaking change, then you'd need to do something like this:
def __init__(
self,
*,
resolution_dimensions: VideoSize = "1280x720",
n_seconds: int | VideoSeconds = 4, # you can accept both here
**kwargs: Any,
) -> None:
...
self._n_seconds: VideoSeconds = str(n_seconds) if isinstance(n_seconds, int) else n_seconds
self._validate_duration() | # IMAGE-TO-VIDEO MODE: Use image as first frame | ||
| logger.info("Image-to-video mode: Using image as first frame") | ||
| image_path = image_piece.converted_value | ||
| image_serializer = data_serializer_factory( | ||
| value=image_path, data_type="image_path", category="prompt-memory-entries" | ||
| ) | ||
| image_bytes = await image_serializer.read_data() | ||
|
|
||
| # Get MIME type for proper file upload (API requires content-type) | ||
| mime_type = DataTypeSerializer.get_mime_type(image_path) | ||
| if not mime_type: | ||
| # Default to PNG if MIME type cannot be determined | ||
| mime_type = "image/png" | ||
|
|
||
| # Create file tuple with filename and MIME type for OpenAI SDK | ||
| # Format: (filename, content, content_type) | ||
| filename = os.path.basename(image_path) | ||
| input_file = (filename, image_bytes, mime_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is a bit too long, way more than 20 lines. Can we split this branch and move some of the logic out to specialized functions? e.g.
async def _prepare_image_input_async(self, *, image_path: str) -> tuple[str, bytes, str]:
image_serializer = data_serializer_factory(
value=image_path, data_type="image_path", category="prompt-memory-entries"
)
image_bytes = await image_serializer.read_data()
mime_type = DataTypeSerializer.get_mime_type(image_path) or "image/png"
filename = os.path.basename(image_path)
return (filename, image_bytes, mime_type)| text_pieces = [p for p in pieces if p.converted_value_data_type == "text"] | ||
| image_pieces = [p for p in pieces if p.converted_value_data_type == "image_path"] | ||
| other_pieces = [p for p in pieces if p.converted_value_data_type not in ("text", "image_path")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the function I suggested above here:
text_pieces = message.get_pieces_by_type(data_type="text")
image_pieces = message.get_pieces_by_type(data_type="image_path")
other_pieces = message.get_pieces_by_type(exclude_types={"text", "image_path"})
Description
Adds image-to-video and remix modes to OpenAIVideoTarget.py (Sora-2)
New capabilities:
Image-to-video: Include image_path piece to use image as first frame
Remix: Set video_id in prompt_metadata to create variation of existing video
Response includes video_id in metadata for chaining workflows
Bug fix: Fixed MIME type handling for image uploads—now passes (filename, bytes, content_type) tuple instead of raw bytes.
Tests and Documentation
Added 7 new unit tests for image-to-video, remix, metadata, and edge cases; updated 2 existing tests
34 unit tests passing in test_video_target.py
ruff check and ruff format pass
No notebook changes