Add support for TransformerEngine flash attention in WAN #299

cpersson-amd · 2025-12-16T18:54:02Z

This PR implements the following:

TransformerEngine flash attention for WAN training and inference.
A new fsdp sharding parallelism optimized for use on GPUs.
Some minor changes to allow for training on flax version 0.11.2.

The code has been tested on WAN 2.1 (training and inference) and flux (only training) using GPUs.

google-cla · 2025-12-16T18:54:07Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

entrpn · 2025-12-30T17:43:09Z

@cpersson-amd I've been out on PTO for a month. I'll take a closer look at this next week. Meanwhile, can you update your branch with the latest in main. Thanks.

src/maxdiffusion/configs/base14.yml

src/maxdiffusion/models/wan/transformers/transformer_wan.py

src/maxdiffusion/models/attention_flax.py

entrpn

In general the PR looks good, but I'm still unsure if adding another axes, fsdp_batch, is really necessary. I would prefer not to add it. The other major thing is switching the mesh_axes from data, fsdp, tensor to data, tensor, fsdp.

src/maxdiffusion/configs/base_wan_14b.yml

src/maxdiffusion/models/wan/autoencoder_kl_wan.py

src/maxdiffusion/models/attention_flax.py

entrpn · 2026-01-15T00:21:56Z

@susanbao can you take a quick look at this PR.

src/maxdiffusion/max_utils.py

entrpn · 2026-01-16T17:23:09Z

@cpersson-amd please review Sanbao's comments above and rebase with main. We tested the PR internally and it looks good. Would you be willing to change the axis fsdp to context? If not, I can make the change after this PR is merged.

cpersson-amd · 2026-01-19T11:55:19Z

@entrpn I've rebased with main, included @susanbao requested change and updated the mesh names: fsdp -> context, fsdp_batch -> fsdp. Please let me know if anything else needs to be changed.

entrpn · 2026-01-20T17:30:24Z

@entrpn I've rebased with main, included @susanbao requested change and updated the mesh names: fsdp -> context, fsdp_batch -> fsdp. Please let me know if anything else needs to be changed.

thanks @cpersson-amd this looks great. Can you run ruff check --fix as the unit tests are failing due to formatting right now.

cpersson-amd · 2026-01-20T17:36:43Z

@entrpn Sure, I ran 'ruff check --fix' and had to manually fix some bare except statements. It should be good with the latest commit

entrpn · 2026-01-20T23:38:00Z

@cpersson-amd Please review my PR to fix some of the unit tests. Once they pass, this can be merged. cpersson-amd#1

…lection

cpersson-amd · 2026-01-21T09:21:02Z

@entrpn PR looks good and is merged, I rebased with main and double checked for errors with ruff. Hopefully it is good to go now.

entrpn · 2026-01-21T17:27:58Z

thanks @cpersson-amd its been merged.

* add flash attn te support for wan * add gpu optimized sharding parallelism * sharding bugfixes * generalize across sharding parallelisms * fix issue with inference using fsdp + te flash attention * revert fsdp_tpu name change * update readme with wan2.1 gpu notes * re-order parallelism axes and revert dynamic context parallel axes selection * remove unused max_utils imports * change mesh names to more accurately reflect sharding * cleanup * fix lint errors * update configs for unit tests. --------- Co-authored-by: Juan Acevedo <juancevedo@gmail.com>

cpersson-amd marked this pull request as draft December 17, 2025 00:18

cpersson-amd marked this pull request as ready for review December 17, 2025 10:21

cpersson-amd closed this Dec 17, 2025

cpersson-amd reopened this Dec 17, 2025

cpersson-amd force-pushed the main branch from 9ca3d79 to a7345e2 Compare December 17, 2025 10:39

entrpn reviewed Jan 5, 2026

View reviewed changes

src/maxdiffusion/configs/base14.yml Outdated Show resolved Hide resolved

src/maxdiffusion/models/wan/transformers/transformer_wan.py Show resolved Hide resolved

src/maxdiffusion/models/attention_flax.py Outdated Show resolved Hide resolved

entrpn reviewed Jan 8, 2026

View reviewed changes

src/maxdiffusion/configs/base_wan_14b.yml Outdated Show resolved Hide resolved

src/maxdiffusion/models/wan/autoencoder_kl_wan.py Outdated Show resolved Hide resolved

src/maxdiffusion/models/attention_flax.py Outdated Show resolved Hide resolved

entrpn requested a review from susanbao January 15, 2026 00:21

susanbao reviewed Jan 15, 2026

View reviewed changes

src/maxdiffusion/max_utils.py Outdated Show resolved Hide resolved

cpersson-amd force-pushed the main branch from 81dc7ff to 87f04f4 Compare January 20, 2026 16:13

entrpn previously approved these changes Jan 20, 2026

View reviewed changes

susanbao previously approved these changes Jan 20, 2026

View reviewed changes

cpersson-amd dismissed stale reviews from susanbao and entrpn via 5da9b9d January 21, 2026 08:29

cpersson-amd added 8 commits January 21, 2026 08:36

add flash attn te support for wan

4786165

add gpu optimized sharding parallelism

96cf849

sharding bugfixes

4292c2d

generalize across sharding parallelisms

f4ebd9a

fix issue with inference using fsdp + te flash attention

fde93f7

revert fsdp_tpu name change

9e35b1c

update readme with wan2.1 gpu notes

f5a0dfd

re-order parallelism axes and revert dynamic context parallel axes se…

0d11211

…lection

cpersson-amd and others added 5 commits January 21, 2026 08:39

remove unused max_utils imports

c316053

change mesh names to more accurately reflect sharding

6f9a1a9

cleanup

97911ac

fix lint errors

5f9f1b1

update configs for unit tests.

99853d2

cpersson-amd force-pushed the main branch from 5da9b9d to 99853d2 Compare January 21, 2026 09:17

entrpn approved these changes Jan 21, 2026

View reviewed changes

entrpn merged commit 042932e into AI-Hypercomputer:main Jan 21, 2026
3 checks passed

Add support for TransformerEngine flash attention in WAN #299

Add support for TransformerEngine flash attention in WAN #299

Uh oh!

Conversation

cpersson-amd commented Dec 16, 2025

Uh oh!

google-cla bot commented Dec 16, 2025

Uh oh!

entrpn commented Dec 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

entrpn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

entrpn commented Jan 15, 2026

Uh oh!

Uh oh!

entrpn commented Jan 16, 2026

Uh oh!

cpersson-amd commented Jan 19, 2026

Uh oh!

entrpn commented Jan 20, 2026

Uh oh!

cpersson-amd commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

entrpn commented Jan 20, 2026

Uh oh!

cpersson-amd commented Jan 21, 2026

Uh oh!

Uh oh!

entrpn commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cpersson-amd commented Jan 20, 2026 •

edited

Loading