[modular] Add LTX Video modular pipeline by akshan-main · Pull Request #13378 · huggingface/diffusers

akshan-main · 2026-04-01T10:57:41Z

What does this PR do?

Adds modular pipeline support for LTX Video, covering both text-to-video and image-to-video. The implementation follows the same structure as the existing Wan modular pipeline.

Text-to-video

LTXBlocks (SequentialPipelineBlocks)
  text_encoder      LTXTextEncoderStep
  denoise           LTXCoreDenoiseStep
    input               LTXTextInputStep
    set_timesteps       LTXSetTimestepsStep
    prepare_latents     LTXPrepareLatentsStep
    denoise             LTXDenoiseStep (LoopSequentialPipelineBlocks)
      before_denoiser       LTXLoopBeforeDenoiser
      denoiser              LTXLoopDenoiser
      after_denoiser        LTXLoopAfterDenoiser
  decode            LTXVaeDecoderStep

Image-to-video

LTXImage2VideoBlocks (SequentialPipelineBlocks)
  text_encoder      LTXTextEncoderStep
  denoise           LTXImage2VideoCoreDenoiseStep
    input               LTXTextInputStep
    set_timesteps       LTXSetTimestepsStep
    prepare_latents     LTXImage2VideoPrepareLatentsStep
    denoise             LTXImage2VideoDenoiseStep (LoopSequentialPipelineBlocks)
      before_denoiser       LTXImage2VideoLoopBeforeDenoiser
      denoiser              LTXImage2VideoLoopDenoiser
      after_denoiser        LTXImage2VideoLoopAfterDenoiser
  decode            LTXVaeDecoderStep

Verification

Parity tested against standard pipelines with identical parameters (H100, bfloat16, 297 frames, 30 steps, seed 42):

	Standard shape	Modular shape	MAD
T2V	(1, 297, 512, 704, 3)	(1, 297, 512, 704, 3)	0.021609
I2V	(1, 297, 512, 704, 3)	(1, 297, 512, 704, 3)	0.016330

T2V - Standard vs Modular:

ltx_standard.mp4

ltx_modular.mp4

T2V reproduction code

import torch
import numpy as np
from diffusers import LTXPipeline, LTXBlocks
from diffusers.utils import export_to_video

model_id = "Lightricks/LTX-Video-0.9.7-dev"
prompt = "A cat walking across a sunlit garden"
height, width, num_frames = 512, 704, 297
steps, cfg, seed = 30, 3.0, 42

# Standard pipeline
std_pipe = LTXPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16).to("cuda")
gen = torch.Generator("cuda").manual_seed(seed)
std_result = std_pipe(
    prompt=prompt, height=height, width=width, num_frames=num_frames,
    num_inference_steps=steps, guidance_scale=cfg, generator=gen,
    output_type="np",
).frames
export_to_video(std_result[0], "ltx_standard.mp4", fps=25)

del std_pipe
torch.cuda.empty_cache()

# Modular pipeline
blocks = LTXBlocks()
mod_pipe = blocks.init_pipeline(model_id)
mod_pipe.load_components(torch_dtype=torch.bfloat16)
mod_pipe.to("cuda")
gen = torch.Generator("cuda").manual_seed(seed)
mod_result = mod_pipe(
    prompt=prompt, height=height, width=width, num_frames=num_frames,
    num_inference_steps=steps, guidance_scale=cfg, generator=gen,
    output="videos",
)
export_to_video(mod_result[0], "ltx_modular.mp4", fps=25)

diff = np.abs(np.array(std_result).astype(float) - np.array(mod_result).astype(float)).mean()
print(f"Mean absolute difference: {diff:.6f}")

I2V - Standard vs Modular:

ltx_i2v_standard.mp4

ltx_i2v_modular.mp4

I2V reproduction code

import torch
import numpy as np
from diffusers import LTXImageToVideoPipeline, LTXImage2VideoBlocks
from diffusers.utils import export_to_video, load_image

model_id = "Lightricks/LTX-Video-0.9.7-dev"
image = load_image("https://cdn.pixabay.com/photo/2014/11/30/14/11/cat-551554_640.jpg").resize((704, 512))
prompt = "A cat slowly turns its head"
height, width, num_frames = 512, 704, 297
steps, cfg, seed = 30, 3.0, 42

# Standard pipeline
std_pipe = LTXImageToVideoPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16).to("cuda")
gen = torch.Generator("cuda").manual_seed(seed)
std_result = std_pipe(
    image=image, prompt=prompt, height=height, width=width, num_frames=num_frames,
    num_inference_steps=steps, guidance_scale=cfg, generator=gen, output_type="np",
).frames
export_to_video(std_result[0], "ltx_i2v_standard.mp4", fps=25)

del std_pipe
torch.cuda.empty_cache()

# Modular pipeline
blocks = LTXImage2VideoBlocks()
pipe = blocks.init_pipeline(model_id)
pipe.load_components(torch_dtype=torch.bfloat16)
pipe.to("cuda")
gen = torch.Generator("cuda").manual_seed(seed)
mod_result = pipe(
    image=image, prompt=prompt, height=height, width=width, num_frames=num_frames,
    num_inference_steps=steps, guidance_scale=cfg, generator=gen, output="videos",
)
export_to_video(mod_result[0], "ltx_i2v_modular.mp4", fps=25)

diff = np.abs(np.array(std_result).astype(float) - np.array(mod_result).astype(float)).mean()
print(f"Mean absolute difference: {diff:.6f}")

Files added

src/diffusers/modular_pipelines/ltx/
  __init__.py
  encoders.py              LTXTextEncoderStep
  before_denoise.py        LTXTextInputStep, LTXSetTimestepsStep, LTXPrepareLatentsStep, LTXImage2VideoPrepareLatentsStep
  denoise.py               T2V and I2V denoise loop blocks
  decoders.py              LTXVaeDecoderStep
  modular_blocks_ltx.py    LTXBlocks, LTXImage2VideoBlocks
  modular_pipeline.py      LTXModularPipeline, LTXImage2VideoModularPipeline

tests/modular_pipelines/ltx/
  test_modular_pipeline_ltx.py

Files modified

src/diffusers/__init__.py
src/diffusers/modular_pipelines/__init__.py
src/diffusers/modular_pipelines/modular_pipeline.py

Note: tiny test model at akshan-main/tiny-ltx-modular-pipe on hf, will have to be moved to hf-internal-testing/ before merge if this is to be okayed.

Contribution to #13295

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case. Modular Diffusers 🧨 #13295
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

@sayakpaul @yiyixuxu @asomoza

yiyixuxu · 2026-04-01T17:05:02Z

cc @asomoza
can you help check if our current LTX (0.97) is broken? the output does not seem right, especailly the T2V one

akshan-main · 2026-04-01T17:54:12Z

Reran with the official example params Lightricks/LTX-Video instead of 0.97, and 480x704, 161 frames, 50 steps, negative prompt. Updated videos:

T2V standard:

ltx_t2v_standard.mp4

T2V modular:

ltx_t2v_modular.mp4

T2V code

import torch
import numpy as np
from diffusers import LTXPipeline, LTXBlocks
from diffusers.utils import export_to_video

model_id = "Lightricks/LTX-Video"
prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The scene appears to be real-life footage"
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
height, width, num_frames = 480, 704, 161
steps, cfg, seed = 50, 3.0, 42

print("=== Standard T2V ===")
std_pipe = LTXPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16).to("cuda")
gen = torch.Generator("cuda").manual_seed(seed)
std_result = std_pipe(
    prompt=prompt, negative_prompt=negative_prompt,
    height=height, width=width, num_frames=num_frames,
    num_inference_steps=steps, guidance_scale=cfg, generator=gen,
    output_type="np",
).frames
export_to_video(std_result[0], "/content/ltx_t2v_standard.mp4", fps=24)
print(f"Standard shape: {np.array(std_result).shape}")

del std_pipe
torch.cuda.empty_cache()

print("\n=== Modular T2V ===")
blocks = LTXBlocks()
mod_pipe = blocks.init_pipeline(model_id)
mod_pipe.load_components(torch_dtype=torch.bfloat16)
mod_pipe.to("cuda")
gen = torch.Generator("cuda").manual_seed(seed)
mod_result = mod_pipe(
    prompt=prompt, negative_prompt=negative_prompt,
    height=height, width=width, num_frames=num_frames,
    num_inference_steps=steps, guidance_scale=cfg, generator=gen,
    output="videos",
)
export_to_video(mod_result[0], "/content/ltx_t2v_modular.mp4", fps=24)
print(f"Modular shape: {np.array(mod_result).shape}")

diff = np.abs(np.array(std_result).astype(float) - np.array(mod_result).astype(float)).mean()
print(f"\nT2V MAD: {diff:.6f}")
print("T2V PARITY:", "PASS" if diff < 1.0 else "FAIL")

del mod_pipe, blocks
torch.cuda.empty_cache()

I2V standard:

ltx_i2v_standard.mp4

I2V modular:

ltx_i2v_modular.mp4

I2V code

from diffusers import LTXImageToVideoPipeline, LTXImage2VideoBlocks
from diffusers.utils import load_image

image = load_image("https://cdn.pixabay.com/photo/2014/11/30/14/11/cat-551554_640.jpg").resize((704, 480))
i2v_prompt = "A cat slowly turns its head and looks around"

print("=== Standard I2V ===")
std_pipe = LTXImageToVideoPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16).to("cuda")
gen = torch.Generator("cuda").manual_seed(seed)
std_result = std_pipe(
    image=image, prompt=i2v_prompt, negative_prompt=negative_prompt,
    height=height, width=width, num_frames=num_frames,
    num_inference_steps=steps, guidance_scale=cfg, generator=gen,
    output_type="np",
).frames
export_to_video(std_result[0], "/content/ltx_i2v_standard.mp4", fps=24)
print(f"Standard shape: {np.array(std_result).shape}")

del std_pipe
torch.cuda.empty_cache()

print("\n=== Modular I2V ===")
blocks = LTXImage2VideoBlocks()
pipe = blocks.init_pipeline(model_id)
pipe.load_components(torch_dtype=torch.bfloat16)
pipe.to("cuda")
gen = torch.Generator("cuda").manual_seed(seed)
mod_result = pipe(
    image=image, prompt=i2v_prompt, negative_prompt=negative_prompt,
    height=height, width=width, num_frames=num_frames,
    num_inference_steps=steps, guidance_scale=cfg, generator=gen,
    output="videos",
)
export_to_video(mod_result[0], "/content/ltx_i2v_modular.mp4", fps=24)
print(f"Modular shape: {np.array(mod_result).shape}")

diff = np.abs(np.array(std_result).astype(float) - np.array(mod_result).astype(float)).mean()
print(f"\nI2V MAD: {diff:.6f}")
print("I2V PARITY:", "PASS" if diff < 1.0 else "FAIL")

print("\n=== Done ===")
print("Videos saved: ltx_t2v_standard.mp4, ltx_t2v_modular.mp4, ltx_i2v_standard.mp4, ltx_i2v_modular.mp4")

Also verified that without CFG (guidance_scale=1.0), MAD drops to 0.008. The small visual difference with CFG enabled comes from the guider running cond/uncond as separate batches vs the standard pipeline's single concatenated batch. This is same behavior as the Wan modular pipeline.

No CFG code

import torch
import numpy as np
from diffusers import LTXPipeline, LTXBlocks

model_id = "Lightricks/LTX-Video"
prompt = "A woman with long brown hair smiles"
height, width, num_frames = 480, 704, 41
steps, seed = 20, 42

# Standard - no CFG
std_pipe = LTXPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16).to("cuda")
gen = torch.Generator("cuda").manual_seed(seed)
std_result = std_pipe(
    prompt=prompt, height=height, width=width, num_frames=num_frames,
    num_inference_steps=steps, guidance_scale=1.0, generator=gen,
    output_type="np",
).frames

del std_pipe; torch.cuda.empty_cache()

# Modular - no CFG
blocks = LTXBlocks()
pipe = blocks.init_pipeline(model_id)
pipe.load_components(torch_dtype=torch.bfloat16)
pipe.to("cuda")
gen = torch.Generator("cuda").manual_seed(seed)
mod_result = pipe(
    prompt=prompt, height=height, width=width, num_frames=num_frames,
    num_inference_steps=steps, guidance_scale=1.0, generator=gen,
    output="videos",
)

diff = np.abs(np.array(std_result).astype(float) - np.array(mod_result).astype(float)).mean()
print(f"No-CFG MAD: {diff:.6f}")

yiyixuxu

thanks, I left some feedbacks!

yiyixuxu · 2026-04-01T18:52:42Z

src/diffusers/modular_pipelines/ltx/modular_pipeline.py

+        return False
+
+
+class LTXImage2VideoModularPipeline(LTXModularPipeline):


IMO we don't need a modular pipeline for I2V
for wan, we have T2V and I2V pipelines because they are different checkpoints/models
for LTX, it's same checkpoint support multiple workflow, no?

Makes sense! Removed LTXImage2VideoModularPipeline. Both workflows use LTXModularPipeline now.

yiyixuxu · 2026-04-01T18:54:53Z

src/diffusers/modular_pipelines/ltx/modular_blocks_ltx.py

+logger = logging.get_logger(__name__)
+
+
+class LTXCoreDenoiseStep(SequentialPipelineBlocks):


can you add docstrings to assembled blocks using #auto_docstring? see this PR introduced it #12958
@claude are you there, can you help explaIn a bit more on #auto_docstring?

yiyixuxu · 2026-04-01T18:55:34Z

@claude can you do a review here?

github-actions · 2026-04-01T18:55:51Z

Claude finished @yiyixuxu's task in 4m 1s —— View job

Code Review: LTX Video Modular Pipeline

Read PR diff and source files
Review correctness issues
Answer #auto_docstring question
Post final review

On `#auto_docstring`

@yiyixuxu, @akshan-main — #auto_docstring is a doc-generation marker introduced in PR #12958. When placed before an assembled block class (like LTXBlocks or LTXImage2VideoBlocks), the documentation tooling introspects that class's inputs, intermediate_outputs, outputs, and description properties and automatically generates a comprehensive API docstring. This means developers don't have to manually maintain the docstring as the pipeline evolves — it stays in sync with the declared parameter specs automatically.

Usage looks like:

# #auto_docstring
class LTXBlocks(SequentialPipelineBlocks):
    ...

Correctness Issues

src/diffusers/__init__.py

[src/diffusers/__init__.py:1214] LTXImage2VideoBlocks and LTXImage2VideoModularPipeline are missing from the TYPE_CHECKING block — they appear in the lazy _import_structure but not in the static import branch, so type-checkers (mypy/pyright) will not resolve them. Fix this →

Also, the four LTX entries are inserted between HeliosPyramidDistilledModularPipeline and HeliosPyramidModularPipeline, splitting the Helios group. Since H < L alphabetically, all LTX entries should be placed after HeliosPyramidModularPipeline.

src/diffusers/modular_pipelines/ltx/before_denoise.py

[before_denoise.py:484] LTXImage2VideoPrepareLatentsStep.__call__ — when generator is a list, the loop iterates range(batch_size) where batch_size = block_state.batch_size * block_state.num_videos_per_prompt. But image only contains block_state.batch_size frames, so image[i] will raise IndexError when num_videos_per_prompt > 1. Fix: iterate over range(block_state.batch_size) and handle video repetition after encoding. Fix this →

src/diffusers/modular_pipelines/ltx/encoders.py

[encoders.py:1330] LTXTextEncoderStep.__call__ — the early return when prompt_embeds is pre-provided skips generating negative_prompt_embeds, even if CFG is active. If a user passes prompt_embeds without negative_prompt_embeds, LTXLoopDenoiser will fail because it declares negative_prompt_embeds as required=True. At minimum, the guard should also check negative_prompt_embeds when requires_unconditional_embeds is true. Fix this →

[encoders.py:1300] The type mismatch check type(prompt) is not type(negative_prompt) is dead code: by this point prompt has already been converted to a list (line 1283) and negative_prompt has also been converted to a list (line 1298), so they will always have the same type.

src/diffusers/modular_pipelines/ltx/denoise.py

[denoise.py:739,965] LTXLoopDenoiser and LTXImage2VideoLoopDenoiser both use a mutable dict as a default argument for guider_input_fields. This is a Python anti-pattern — if the dict is mutated by any caller, it affects all future calls using the default. Use None as the default and set the dict inside the function body instead. Fix this →

[denoise.py:709,960] LTXLoopDenoiser and LTXImage2VideoLoopDenoiser are ~90% identical — the only runtime difference is that I2V uses block_state.timestep_adjusted instead of t.expand(...) for the timestep argument. Consider refactoring into a base class with a _get_timestep(block_state, t) hook to avoid duplicated maintenance surface.

src/diffusers/modular_pipelines/ltx/decoders.py

[decoders.py:637] latents.to(block_state.dtype if hasattr(block_state, 'dtype') else torch.float32) — dtype is set by LTXTextInputStep and should always be present by the time the decoder runs. The hasattr fallback silently hides a pipeline state management bug if dtype is somehow missing; better to let it fail clearly, or assert its presence.

src/diffusers/modular_pipelines/ltx/modular_blocks_ltx.py

[modular_blocks_ltx.py:32] Missing #auto_docstring on LTXBlocks and LTXImage2VideoBlocks as noted by @yiyixuxu.

Architecture Note

Echoing @yiyixuxu's comment: since LTX uses a single checkpoint for both T2V and I2V (unlike Wan where they're separate model families), LTXImage2VideoModularPipeline as a distinct pipeline class adds minimal value. The two workflows are differentiated entirely by the block set (LTXBlocks vs LTXImage2VideoBlocks) which is already the modular pattern's intended separation point. The ltx-i2v key in modular_pipeline.py's registry (and LTXImage2VideoModularPipeline class) could be dropped.

akshan-main · 2026-04-01T20:48:07Z

@yiyixuxu Would you like me to add the condition and latent upsampler workflows to this PR as well, or keep those for a follow-up? Also happy to do LTX2 modular as a next PR if that's useful.

condition is same checkpoint, upsampler is different.

yiyixuxu · 2026-04-02T22:50:36Z

@akshan-main let's merge this one first and follow up with conditionnal + upsample
LTX2.3 I'm working on it already:)

akshan-main · 2026-04-02T23:45:38Z

Sounds good! I'll work on the conditional + upsampler follow-up PR next, then.

Meanwhile, I also opened a PR for HunyuanVideo 1.5 modular #13389. Both have MAD 0.000000 parity since the standard HunyuanVideo 1.5 pipeline already uses the guider abstraction, so there's no batching difference like with LTX's manual CFG concat. Would love your feedback on it when you get a chance!

akshan-main · 2026-04-03T00:17:47Z

added docstrings to assembled blocks using #auto_docstring

yiyixuxu · 2026-04-03T02:33:16Z

src/diffusers/modular_pipelines/ltx/modular_blocks_ltx.py

+
+      Inputs:
+          num_videos_per_prompt (`None`, *optional*, defaults to 1):
+              TODO: Add description.


can you update the InputParam/outputParam definitions https://huggingface.co/docs/diffusers/main/en/modular_diffusers/auto_docstring#parameter-templates

yiyixuxu · 2026-04-03T02:38:38Z

@claude are you here? can you look at th PR and explain what steps needs to be done for this to work

modular_ltx = ModularPipeline.from_pretrained("Lightricks/LTX-Video")
modular_ltx.load_components(torch_dtype=torch.bfloat16)

github-actions · 2026-04-03T02:38:54Z

Claude finished @yiyixuxu's task in 2m 27s —— View job

What's needed for `ModularPipeline.from_pretrained("Lightricks/LTX-Video")`

Read PR code and modular pipeline registry
Understand current from_pretrained dispatch mechanism
Explain what's needed

Here's what ModularPipeline.from_pretrained(repo) actually does (from modular_pipeline.py:2062):

# Step 1: try to load blocks from the repo's auto_map
blocks = ModularPipelineBlocks.from_pretrained(repo, ...)  # needs auto_map in config.json
# Step 2: load config from modular_model_index.json
config_dict = cls.load_config(repo)                        # needs modular_model_index.json
pipeline_class = _get_pipeline_class(cls, config=config_dict)
# Step 3: construct
pipeline = pipeline_class(blocks=blocks, pretrained_model_name_or_path=repo)

Lightricks/LTX-Video is a standard model repo — it has a model_index.json but no modular_model_index.json and no auto_map for blocks. So both steps above fail gracefully and you end up with blocks=None and an empty ModularPipeline.

Three things need to be in place:

1. A hub repo with `modular_model_index.json`

This JSON tells the pipeline where to load each component from:

{
  "scheduler": ["diffusers", "FlowMatchEulerDiscreteScheduler", {"repo": "Lightricks/LTX-Video", "subfolder": "scheduler"}],
  "vae":        ["diffusers", "AutoencoderKLLTXVideo",          {"repo": "Lightricks/LTX-Video", "subfolder": "vae"}],
  "text_encoder": [...],
  "tokenizer":    [...],
  "transformer":  [...]
}

This can either live in the official Lightricks/LTX-Video repo, or in a separate modular pipeline repo (e.g. hf-internal-testing/ltx-modular-pipe) that points back to the weights. The test already uses the latter pattern (akshan-main/tiny-ltx-modular-pipe).

2. A `config.json` with `auto_map` pointing to `LTXBlocks`

{ "auto_map": { "ModularPipelineBlocks": "modular_blocks_ltx.LTXBlocks" } }

This lives in the same repo as the modular_model_index.json. Without it, blocks=None and no pipeline logic is wired up.

3. `LTXModularPipeline` → `LTXBlocks` entry in `MODULAR_PIPELINE_BLOCKS_MAPPING`

In modular_pipeline.py:66, the current mapping is:

MODULAR_PIPELINE_BLOCKS_MAPPING = OrderedDict([
    ("StableDiffusionXLModularPipeline", "StableDiffusionXLAutoBlocks"),
])

This is used in ModularPipeline.__init__ (line 1873) as a fallback when blocks=None. Without ("LTXModularPipeline", "LTXBlocks") here, even LTXModularPipeline(pretrained_model_name_or_path=...) won't wire up the right blocks automatically. This entry is missing from the PR.

Summary — current PR state vs. what's needed:

Requirement	Status
`("ltx", _create_default_map_fn("LTXModularPipeline"))` in `MODULAR_PIPELINE_MAPPING`	✅ Added by PR (`modular_pipeline.py`)
`("LTXModularPipeline", "LTXBlocks")` in `MODULAR_PIPELINE_BLOCKS_MAPPING`	❌ Missing
Hub repo with `modular_model_index.json` + `auto_map`	⚠️ Only in `akshan-main/tiny-ltx-modular-pipe` (needs to move to `hf-internal-testing/`)

So the usage would be:

# Option A — using a dedicated modular pipeline repo (current pattern, works today)
pipe = ModularPipeline.from_pretrained("hf-internal-testing/ltx-modular-pipe")
pipe.load_components(torch_dtype=torch.bfloat16)

# Option B — directly from the model repo (requires Lightricks/LTX-Video
#             to have modular_model_index.json + auto_map added)
pipe = ModularPipeline.from_pretrained("Lightricks/LTX-Video")
pipe.load_components(torch_dtype=torch.bfloat16)

The PR as written only supports the explicit LTXBlocks().init_pipeline(model_id) entry point. To support the cleaner ModularPipeline.from_pretrained("Lightricks/LTX-Video") pattern, you'd need MODULAR_PIPELINE_BLOCKS_MAPPING updated and either the official model repo or a sidecar modular-pipeline repo published with the right JSON files.

HuggingFaceDocBuilderDev · 2026-04-03T02:46:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu · 2026-04-03T02:50:00Z

@claude

I think as long as MODULAR_PIPELINE_MAPPING is updated the pattern would work no?

To support the cleaner ModularPipeline.from_pretrained("Lightricks/LTX-Video") pattern, you'd need MODULAR_PIPELINE_BLOCKS_MAPPING updated and either the official model repo or a sidecar modular-pipeline repo published with the right JSON files.

can you verify and if agree with me, maybe we should add an item to the skill file? https://github.com/huggingface/diffusers/blob/main/.ai/skills/model-integration/modular-conversion.md

github-actions · 2026-04-03T02:50:21Z

Claude finished @yiyixuxu's task in 2m 47s —— View job

Read modular_pipeline.py to trace the from_pretrained flow
Read the skill file
Verify the claim and propose skill file update

Verification: is `MODULAR_PIPELINE_BLOCKS_MAPPING` sufficient?

Yes, @yiyixuxu is correct — my previous comment was wrong to list auto_map as a requirement. Here's why:

ModularPipeline.from_pretrained(repo) has two block-wiring paths:

Path	Mechanism	When used
A	`ModularPipelineBlocks.from_pretrained(repo)` → reads `auto_map` from repo's `config.json`	Only when the hub repo explicitly has `auto_map`
B	`MODULAR_PIPELINE_BLOCKS_MAPPING.get(pipeline_class.__name__)` inside `__init__`	When `blocks=None` (path A returned nothing)

Path B is the MODULAR_PIPELINE_BLOCKS_MAPPING path. It activates when blocks=None, which is the normal case for a standard model repo like Lightricks/LTX-Video. So auto_map is not required.

The corrected requirements for ModularPipeline.from_pretrained("Lightricks/LTX-Video"):

Requirement	Previous comment	Corrected
`("LTXModularPipeline", "LTXBlocks")` in `MODULAR_PIPELINE_BLOCKS_MAPPING`	❌ Missing (noted)	❌ Missing — needs to be added
Hub repo with `modular_model_index.json` (has `_class_name: "LTXModularPipeline"` + component specs)	⚠️ Needed	⚠️ Still needed — so `_get_pipeline_class` can resolve `LTXModularPipeline` and `load_components` knows where to fetch weights
`auto_map` in hub repo's `config.json`	⚠️ Listed as needed	✅ Not required — Path B handles this

So the minimal change to make ModularPipeline.from_pretrained("Lightricks/LTX-Video") work:

Add ("LTXModularPipeline", "LTXBlocks") to MODULAR_PIPELINE_BLOCKS_MAPPING in modular_pipeline.py (this PR is missing it)
The hub repo (e.g. hf-internal-testing/ltx-modular-pipe) needs modular_model_index.json with correct _class_name and component loading specs — auto_map is optional

Proposed skill file addition

For .ai/skills/model-integration/modular-conversion.md, I'd suggest adding to the conversion checklist:

- [ ] Add `("ltx", "LTXModularPipeline")` to `MODULAR_PIPELINE_MAPPING` in `modular_pipeline.py`
      (enables `blocks.init_pipeline(model_id)` to create the right pipeline class)
- [ ] Add `("LTXModularPipeline", "LTXBlocks")` to `MODULAR_PIPELINE_BLOCKS_MAPPING` in `modular_pipeline.py`
      (enables `ModularPipeline.from_pretrained(repo)` without needing `auto_map` in the hub repo — 
       `auto_map` is only needed if you want `ModularPipelineBlocks.from_pretrained()` to work directly)
- [ ] Publish a sidecar modular repo (e.g. `hf-internal-testing/<model>-modular-pipe`) with 
      `modular_model_index.json` pointing to the model weights; `auto_map` is optional

Since the .ai/ directory doesn't exist on this branch (it lives on main only), and the skill file is outside src/diffusers/, I'll leave the actual edit to you or @akshan-main. But I'd be glad to implement the missing MODULAR_PIPELINE_BLOCKS_MAPPING entry in modular_pipeline.py if you want — just say the word.

yiyixuxu · 2026-04-03T02:51:10Z

@akshan-main thanks, i will do a review now
just heads up we are testing our claude a a little bit so there will be come conversaion back and forth with them, hope you don't mind :)

akshan-main · 2026-04-03T02:51:45Z

MODULAR_PIPELINE_BLOCKS_MAPPING doesn't exist in the codebase

yiyixuxu · 2026-04-03T02:53:11Z

ohh right MODULAR_PIPELINE_MAPPING (you already added so don't worry about it)

yiyixuxu

thanks!
I left some comments!

@claude, can you look through my comments and put together a summary on the proposed change in the skill file?

yiyixuxu · 2026-04-03T03:05:09Z

src/diffusers/modular_pipelines/ltx/encoders.py

+        return [
+            InputParam.template("prompt"),
+            InputParam.template("negative_prompt"),
+            InputParam.template("prompt_embeds"),


we don't need to list prompt embeds as input. We use this pattern in our standard pipelines to let user skip encoding etc, but in modular it is not needed, you can just pop out the text encoder block and run it separately.

yiyixuxu · 2026-04-03T03:09:50Z

src/diffusers/modular_pipelines/ltx/encoders.py

+            raise ValueError(f"`prompt` has to be of type `str` or `list` but is {type(block_state.prompt)}")
+
+    @staticmethod
+    def _get_t5_prompt_embeds(


can we make this a regular function? so custom blocks can use it as well

yiyixuxu · 2026-04-03T03:19:20Z

src/diffusers/modular_pipelines/ltx/before_denoise.py

+        block_state = self.get_block_state(state)
+
+        # Set guidance_scale on guider so CFG is configured correctly
+        guidance_scale = getattr(block_state, "guidance_scale", 3.0)


we don't need to accept guidance_scale in modular pipeline,. User can configure the guider separately https://huggingface.co/docs/diffusers/modular_diffusers/guiders#changing-guider-parameters

as we support more guider types, each will have its own set of paramters and we won;t be able to forwarding all of them through the pipeline inputs.

yiyixuxu · 2026-04-03T04:39:28Z

src/diffusers/modular_pipelines/ltx/before_denoise.py

+    @property
+    def intermediate_outputs(self) -> list[OutputParam]:
+        return [
+            OutputParam.template("latents"),


this we cannot use the template here, because it is not the "denoise latent" as defined in the output param template

yiyixuxu · 2026-04-03T04:41:02Z

src/diffusers/modular_pipelines/ltx/before_denoise.py

+import torch
+
+from ...models import LTXVideoTransformer3DModel
+from ...pipelines.ltx.pipeline_ltx import LTXPipeline


let's not import the standard pipeline here
the modular and standard pipeline are meant to be parallel.

yiyixuxu · 2026-04-03T04:41:51Z

src/diffusers/modular_pipelines/ltx/before_denoise.py

+            block_state.latents = randn_tensor(
+                shape, generator=block_state.generator, device=device, dtype=torch.float32
+            )
+            block_state.latents = LTXPipeline._pack_latents(


you can redefine it as regular function here or maybe use #Copied from

see example using #Copied from https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/wan/before_denoise.py#L495

yiyixuxu · 2026-04-03T04:50:10Z

src/diffusers/modular_pipelines/ltx/before_denoise.py

+        if not isinstance(image, torch.Tensor):
+            from ...video_processor import VideoProcessor
+
+            processor = VideoProcessor(vae_scale_factor=components.vae_spatial_compression_ratio)


this should be a components no?

yiyixuxu · 2026-04-03T04:54:44Z

src/diffusers/modular_pipelines/ltx/before_denoise.py

+        else:
+            init_latents = [
+                retrieve_latents(
+                    components.vae.encode(img.unsqueeze(0).unsqueeze(2).to(vae_dtype)), block_state.generator


we should extract the vae encoding into its own block in encoders.py (e.g. LTXVaeEncoderStep), and here this step should accept image_latents as input instead of raw image. This way users can run the VAE encoder standalone and pass pre-computed latents directly. See https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/wan/encoders.py#L470

yiyixuxu · 2026-04-03T04:55:31Z

src/diffusers/modular_pipelines/ltx/decoders.py

+
+from ...configuration_utils import FrozenDict
+from ...models import AutoencoderKLLTXVideo
+from ...pipelines.ltx.pipeline_ltx import LTXPipeline


same here
let's either redefine or copy the pipeline methods you need

yiyixuxu · 2026-04-03T04:59:21Z

src/diffusers/modular_pipelines/ltx/decoders.py

+
+        latents = block_state.latents
+
+        if block_state.output_type == "latent":


we don't need accept latent output_type in modular
similar to encode_prompt, we can pop out the decoder step from the pipeline if we don't need it decodded

akshan-main · 2026-04-09T00:50:56Z

@yiyixuxu addressed all in respective comments. Lmk if there are more things I need to put some work on!

akshan-main · 2026-04-09T04:03:14Z

friendly ping @yiyixuxu. does the current LTX state look good? happy to transfer the applicable changes over to my modular HunyuanVideo 1.5 PR #13389 as well

akshan-main · 2026-04-09T17:38:58Z

another friendly ping @yiyixuxu. Would be really glad to help the diffusers team ship this today!

yiyixuxu

thanks, i left some small comments

yiyixuxu · 2026-04-09T22:12:06Z

src/diffusers/modular_pipelines/ltx/before_denoise.py

+    return timesteps, num_inference_steps
+
+
+def _pack_latents(latents: torch.Tensor, patch_size: int = 1, patch_size_t: int = 1) -> torch.Tensor:


not required, but consider make a pachifider so that you can use it different places

diffusers/src/diffusers/modular_pipelines/qwenimage/modular_pipeline.py

Line 21 in 431066e

class QwenImagePachifier(ConfigMixin):

will consider a pachifider as a follow-up pr

yiyixuxu · 2026-04-09T22:16:28Z

src/diffusers/modular_pipelines/ltx/before_denoise.py

+def _normalize_latents(
+    latents: torch.Tensor, latents_mean: torch.Tensor, latents_std: torch.Tensor, scaling_factor: float = 1.0
+) -> torch.Tensor:
+    # Normalize latents across the channel dimension [B, C, F, H, W]
+    latents_mean = latents_mean.view(1, -1, 1, 1, 1).to(latents.device, latents.dtype)
+    latents_std = latents_std.view(1, -1, 1, 1, 1).to(latents.device, latents.dtype)
+    latents = (latents - latents_mean) * scaling_factor / latents_std
+    return latents


Suggested change

def _normalize_latents(

latents: torch.Tensor, latents_mean: torch.Tensor, latents_std: torch.Tensor, scaling_factor: float = 1.0

) -> torch.Tensor:

# Normalize latents across the channel dimension [B, C, F, H, W]

latents_mean = latents_mean.view(1, -1, 1, 1, 1).to(latents.device, latents.dtype)

latents_std = latents_std.view(1, -1, 1, 1, 1).to(latents.device, latents.dtype)

latents = (latents - latents_mean) * scaling_factor / latents_std

return latents

i think it is not used here

yiyixuxu · 2026-04-09T22:24:49Z

src/diffusers/modular_pipelines/ltx/denoise.py

+        from ...configuration_utils import FrozenDict
+        from ...guiders import ClassifierFreeGuidance


Suggested change

from ...configuration_utils import FrozenDict

from ...guiders import ClassifierFreeGuidance

yiyixuxu · 2026-04-09T22:32:20Z

src/diffusers/modular_pipelines/ltx/modular_pipeline.py

+    > [!WARNING] > This is an experimental feature and is likely to change in the future.
+    """
+
+    default_blocks_name = "LTXBlocks"


can you add a LTXAutoBLocks and set that as default isntead?
https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/flux2/modular_blocks_flux2_klein.py#L324

yiyixuxu · 2026-04-09T22:44:08Z

src/diffusers/modular_pipelines/ltx/before_denoise.py

+
+        mask_shape = (batch_size, 1, num_frames, height, width)
+
+        if block_state.latents is not None:


so latents from user is the initial noise in T2V but here would noised image latents, no? it's confusing, no? might be an inconsistency we always had in our standard pipelines but worse here so let's fix it here: latents input should always be initial pure noise regardless of workflows

I think we can work with the LTXPrepareLatentsStep and just in this block focus on

add noise into image_latents

create conditioning_mask

so for I2V it would use both LTXPrepareLatentsStep -> LTXImage2VideoPrepareLatentsStep
would this work?

yes that works. implemented and verified (A100, LTX 0.9.1):

I2V latents now chains LTXPrepareLatentsStep (pure noise) -> LTXImage2VideoPrepareLatentsStep (mix with image_latents & conditioning mask)

MAD vs steps (256x256, 9 frames):

steps= 4 I2V modular: 0.009282 I2V auto: 0.009273

steps=10 I2V modular: 0.021779 I2V auto: 0.021794

steps=30 I2V modular: 0.020742 I2V auto: 0.020739

Full quality (480x704, 161 frames, 30 steps):

T2V MAD (LTXBlocks vs standard): 0.025949

I2V MAD (LTXImage2VideoBlocks vs standard): 0.046978

T2V MAD (LTXAutoBlocks vs standard): 0.025949

I2V MAD (LTXAutoBlocks vs standard): 0.046980

https://colab.research.google.com/gist/akshan-main/112d7cfdb9175f1f3c755c31ba7b66aa/untitled71.ipynb

notebook I used

…atents flow

src/diffusers/modular_pipelines/__init__.py

src/diffusers/__init__.py

yiyixuxu · 2026-04-10T18:36:20Z

src/diffusers/modular_pipelines/ltx/modular_blocks_ltx.py

+    """
+    Auto blocks for LTX Video that support both text-to-video and image-to-video workflows.
+
+    Supported workflows:


ok so interesting, did you manually write this docstring?
missing a _workflow_map here but are documented here, what happends if you run?

blocks = LTXAutoBlocks() blocks.available_workflows

see doc on workflow map https://huggingface.co/docs/diffusers/modular_diffusers/auto_pipeline_blocks#workflows

this is new doc on auto docstring https://huggingface.co/docs/diffusers/main/en/modular_diffusers/auto_docstring

you need to run after placed the marker

python utils/modular_auto_docstring.py --fix_and_overwrite

yup wrote it manually looking at the flux2 pattern but missed _workflow_map. Added it and ran the auto docstring tool now.

yiyixuxu · 2026-04-10T18:50:01Z

tests/modular_pipelines/ltx/test_modular_pipeline_ltx.py

+    pipeline_class = LTXModularPipeline
+    pipeline_blocks_class = LTXAutoBlocks
+    pretrained_model_name_or_path = "akshan-main/tiny-ltx-modular-pipe"
+


should enable workflow test, but can be a follow-up:)
https://github.com/huggingface/diffusers/blob/main/tests/modular_pipelines/z_image/test_modular_pipeline_z_image.py#L53

I will add this since test is failing and also pachifier. Will ping you once I'm done. Meanwhile, you might want to take a look at pr #13440, a small thing I noticed

yiyixuxu

thanks @akshan-main, it's a pleasure working with you:)
will merge once CI is green

akshan-main added 2 commits April 1, 2026 02:55

Add modular pipeline support for LTX Video

f357b98

Fix guidance_scale passthrough to guider

11b891c

akshan-main marked this pull request as ready for review April 1, 2026 10:58

akshan-main added 3 commits April 1, 2026 07:04

Add LTX modular pipeline tests

4d2d73e

Add LTX image-to-video modular pipeline

7b645e6

Fix i2v VAE dtype mismatch

7491d56

Add cache_context to denoiser for CFG parity

1e53507

yiyixuxu reviewed Apr 1, 2026

View reviewed changes

Address review feedback

322727d

Generate auto docstrings for LTX assembled blocks

4b644f7

Fix ruff lint and format issues

3da70da

yiyixuxu reviewed Apr 3, 2026

View reviewed changes

use InputParam/OutputParam templates and ruff check

38cfc86

yiyixuxu reviewed Apr 3, 2026

View reviewed changes

github-actions bot added the size/L PR with diff > 200 LOC label Apr 8, 2026

yiyixuxu reviewed Apr 9, 2026

View reviewed changes

Address review: remove unused code, add LTXAutoBlocks, refactor I2V l…

e5aeaf9

…atents flow

github-actions bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 9, 2026

akshan-main requested a review from yiyixuxu April 10, 2026 05:53

yiyixuxu reviewed Apr 10, 2026

View reviewed changes

src/diffusers/modular_pipelines/__init__.py Outdated Show resolved Hide resolved

yiyixuxu reviewed Apr 10, 2026

View reviewed changes

src/diffusers/modular_pipelines/__init__.py Outdated Show resolved Hide resolved

yiyixuxu reviewed Apr 10, 2026

View reviewed changes

src/diffusers/__init__.py Outdated Show resolved Hide resolved

yiyixuxu reviewed Apr 10, 2026

View reviewed changes

src/diffusers/__init__.py Outdated Show resolved Hide resolved

removed LTXBlocks,LTXImage2VideoBlocks

4ffeb86

github-actions bot added utils size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 10, 2026

akshan-main requested a review from yiyixuxu April 10, 2026 18:23

Update test to use LTXAutoBlocks

d44f24c

github-actions bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 10, 2026

yiyixuxu reviewed Apr 10, 2026

View reviewed changes

Merge branch 'main' into modular-ltx

4769f19

github-actions bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 10, 2026

workflow map and auto docstring

310ec7c

github-actions bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 10, 2026

yiyixuxu reviewed Apr 10, 2026

View reviewed changes

yiyixuxu approved these changes Apr 10, 2026

View reviewed changes

		return False


		class LTXImage2VideoModularPipeline(LTXModularPipeline):

		logger = logging.get_logger(__name__)


		class LTXCoreDenoiseStep(SequentialPipelineBlocks):


		latents = block_state.latents

		if block_state.output_type == "latent":

		return timesteps, num_inference_steps


		def _pack_latents(latents: torch.Tensor, patch_size: int = 1, patch_size_t: int = 1) -> torch.Tensor:

		from ...configuration_utils import FrozenDict
		from ...guiders import ClassifierFreeGuidance


		mask_shape = (batch_size, 1, num_frames, height, width)

		if block_state.latents is not None:

Conversation

akshan-main commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Text-to-video

Image-to-video

Verification

Files added

Files modified

Before submitting

Who can review?

Uh oh!

yiyixuxu commented Apr 1, 2026

Uh oh!

akshan-main commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yiyixuxu commented Apr 1, 2026

Uh oh!

github-actions bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: LTX Video Modular Pipeline

On #auto_docstring

Correctness Issues

Architecture Note

Uh oh!

akshan-main commented Apr 1, 2026

Uh oh!

yiyixuxu commented Apr 2, 2026

Uh oh!

akshan-main commented Apr 2, 2026

Uh oh!

akshan-main commented Apr 3, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yiyixuxu commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's needed for ModularPipeline.from_pretrained("Lightricks/LTX-Video")

1. A hub repo with modular_model_index.json

2. A config.json with auto_map pointing to LTXBlocks

3. LTXModularPipeline → LTXBlocks entry in MODULAR_PIPELINE_BLOCKS_MAPPING

Uh oh!

HuggingFaceDocBuilderDev commented Apr 3, 2026

Uh oh!

yiyixuxu commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verification: is MODULAR_PIPELINE_BLOCKS_MAPPING sufficient?

Proposed skill file addition

Uh oh!

yiyixuxu commented Apr 3, 2026

Uh oh!

akshan-main commented Apr 3, 2026

Uh oh!

yiyixuxu commented Apr 3, 2026

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akshan-main commented Apr 1, 2026 •

edited

Loading

akshan-main commented Apr 1, 2026 •

edited

Loading

github-actions bot commented Apr 1, 2026 •

edited

Loading

On `#auto_docstring`

yiyixuxu commented Apr 3, 2026 •

edited

Loading

github-actions bot commented Apr 3, 2026 •

edited

Loading

What's needed for `ModularPipeline.from_pretrained("Lightricks/LTX-Video")`

1. A hub repo with `modular_model_index.json`

2. A `config.json` with `auto_map` pointing to `LTXBlocks`

3. `LTXModularPipeline` → `LTXBlocks` entry in `MODULAR_PIPELINE_BLOCKS_MAPPING`

yiyixuxu commented Apr 3, 2026 •

edited

Loading

github-actions bot commented Apr 3, 2026 •

edited

Loading

Verification: is `MODULAR_PIPELINE_BLOCKS_MAPPING` sufficient?