Skip to content

Why does calculate_dimensions in qwen-image-edit require image height/width to be multiples of 32? #12997

@lucky-liuzhihong

Description

@lucky-liuzhihong

Hi, thanks for the great work!
I have a question regarding the logic in calculate_dimensions.
Currently, the image height and width are constrained to be multiples of 32.
From my understanding:
The VAE has a downsampling factor of 8, so the latent spatial size should require the input dimensions to be multiples of 8.
Before entering the DiT, the latent is passed through a Patch Embedding layer with patch_size = 2.
That would further imply a total factor of 8 × 2 = 16.
Based on this, it seems that constraining the image dimensions to be multiples of 16 should already be sufficient.

Could you clarify why a multiple of 32 is required here?
Is there an additional downsampling stage, architectural constraint, or implementation detail that I might be missing?
Thanks in advance for the clarification!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions