Skip to content

HunyuanVideo 1.5 I2V image conditioning preprocessed at latent resolution instead of pixel resolution #13439

@akshan-main

Description

@akshan-main

Describe the bug

While working on the modular pipeline for HunyuanVideo 1.5 (#13389), I found a bug in prepare_cond_latents_and_mask in pipeline_hunyuan_video1_5_image2video.py.

Line 614 shadows the pixel height/width parameters with latent dims from latents.shape:

def prepare_cond_latents_and_mask(self, latents, image, batch_size, height, width, dtype, device):
    batch, channels, frames, height, width = latents.shape  # overwrites pixel h/w with latent h/w
    image_latents = self._get_image_latents(..., height=height, width=width)

_get_image_latents then calls image_processor.preprocess(image, height=height, width=width) with latent dims (e.g. 30x44 instead of 480x704). After snapping to the nearest vae_scale_factor (16) multiple, the image gets resized to ~16x32 pixels before VAE encoding, producing a ~1x2 latent instead of the expected 30x44.

The original Tencent implementation (HunyuanVideo-1.5) resizes at pixel resolution before encoding.

I will open a pr for this

Reproduction

from diffusers.pipelines.hunyuan_video1_5.pipeline_hunyuan_video1_5_image2video import HunyuanVideo15ImageToVideoPipeline
import inspect

# Line 614 of prepare_cond_latents_and_mask shadows pixel height/width with latent dims
source = inspect.getsource(HunyuanVideo15ImageToVideoPipeline.prepare_cond_latents_and_mask)
# "batch, channels, frames, height, width = latents.shape" overwrites the pixel h/w params
assert "batch, channels, frames, height, width = latents.shape" in source
print("height/width parameters are shadowed by latent dims from latents.shape")

Logs

System Info

diffusers 0.38.0.dev0, Python 3.12, PyTorch 2.6

Who can help?

@sayakpaul @DN6 @yiyixuxu

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions