Why does calculate_dimensions in qwen-image-edit require image height/width to be multiples of 32?

Hi, thanks for the great work!
I have a question regarding the logic in calculate_dimensions.
Currently, the image height and width are constrained to be multiples of 32.
From my understanding:
The VAE has a downsampling factor of 8, so the latent spatial size should require the input dimensions to be multiples of 8.
Before entering the DiT, the latent is passed through a Patch Embedding layer with patch_size = 2.
That would further imply a total factor of 8 × 2 = 16.
Based on this, it seems that constraining the image dimensions to be multiples of 16 should already be sufficient.

Could you clarify why a multiple of 32 is required here?
Is there an additional downsampling stage, architectural constraint, or implementation detail that I might be missing?
Thanks in advance for the clarification!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why does calculate_dimensions in qwen-image-edit require image height/width to be multiples of 32? #12997

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Why does calculate_dimensions in qwen-image-edit require image height/width to be multiples of 32? #12997

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions