Fuse LayerNorm modulation into Triton kernel and remove RoPE dtype casts by Jordanyang · Pull Request #133 · thu-ml/TurboDiffusion

Jordanyang · 2026-06-11T08:45:12Z

Summary

Remove unnecessary RoPE dtype conversions in the Wan2.1/Wan2.2 network paths.
Add FastLayerNorm.modulate.
Fuse layernorm + scale + shift into the Triton LayerNorm kernel path.

Motivation

This reduces extra dtype conversion overhead around RoPE and moves the modulation step into the fused LayerNorm kernel path, avoiding separate scale/shift operations after normalization.

Changes

Updated turbodiffusion/ops/core.py to support fused LayerNorm modulation.
Updated turbodiffusion/rcm/networks/wan2pt1.py to use the fused modulation path and remove RoPE dtype casts.
Updated turbodiffusion/rcm/networks/wan2pt2.py to use the same fused modulation path and remove RoPE dtype casts.

Validation

Successfully run Wan2.1 14B inference tests for both 720p and 480p resolutions

…Triton kernel

去掉rope类型转换和增加 FastLayerNorm.modulate，将 layernorm + scale + shift 融合到 …

a41a2ed

…Triton kernel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fuse LayerNorm modulation into Triton kernel and remove RoPE dtype casts#133

Fuse LayerNorm modulation into Triton kernel and remove RoPE dtype casts#133
Jordanyang wants to merge 1 commit into
thu-ml:mainfrom
Jordanyang:opt_dtype_kernel_fusion

Jordanyang commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Jordanyang commented Jun 11, 2026

Summary

Motivation

Changes

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant