-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Hi,
I tested Cosmos-Predict2.5 Video2World, but the generated video is quite different from that generated with NVIDIA's original implementation.
It seems that the pipeline reconstructs the input video rather than generates the next frames.
Can you check whether the pipeline functioned properly, or if there is a problem? (e.g., example code, implementation, etc.)
I attached the result files generated with the HuggingFace implementation and NVIDIA's original implementation.
(To generate a video with NVIDIA's implementation, I followed the example script in NVIDIA's official repository.)
Reproduction
diffusers/src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py
Lines 138 to 160 in 3996788
| >>> # Video2World: condition on an input clip and predict a 93-frame world video. | |
| >>> prompt = ( | |
| ... "The video opens with an aerial view of a large-scale sand mining construction operation, showcasing extensive piles " | |
| ... "of brown sand meticulously arranged in parallel rows. A central water channel, fed by a water pipe, flows through the " | |
| ... "middle of these sand heaps, creating ripples and movement as it cascades down. The surrounding area features dense green " | |
| ... "vegetation on the left, contrasting with the sandy terrain, while a body of water is visible in the background on the right. " | |
| ... "As the video progresses, a piece of heavy machinery, likely a bulldozer, enters the frame from the right, moving slowly along " | |
| ... "the edge of the sand piles. This machinery's presence indicates ongoing construction work in the operation. The final frame " | |
| ... "captures the same scene, with the water continuing its flow and the bulldozer still in motion, maintaining the dynamic yet " | |
| ... "steady pace of the construction activity." | |
| ... ) | |
| >>> input_video = load_video( | |
| ... "https://github.com/nvidia-cosmos/cosmos-predict2.5/raw/refs/heads/main/assets/base/sand_mining.mp4" | |
| ... ) | |
| >>> video = pipe( | |
| ... image=None, | |
| ... video=input_video, | |
| ... prompt=prompt, | |
| ... negative_prompt=negative_prompt, | |
| ... num_frames=93, | |
| ... generator=torch.Generator().manual_seed(1), | |
| ... ).frames[0] | |
| >>> export_to_video(video, "video2world.mp4", fps=16) |
| HuggingFace | NVIDIA |
|---|---|
| https://github.com/user-attachments/assets/10c2a085-519f-46b5-957b-36d3b83955dd | https://github.com/user-attachments/assets/b9bb3f02-c5ad-467e-9dd5-8503507e5f9c |
Logs
System Info
- 🤗 Diffusers version: 0.37.0.dev0
- Platform: Linux-6.8.0-60-generic-x86_64-with-glibc2.35
- Running on Google Colab?: No
- Python version: 3.10.12
- PyTorch version (GPU?): 2.9.1+cu128 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.36.0
- Transformers version: 4.57.3
- Accelerate version: 1.12.0
- PEFT version: 0.18.0
- Bitsandbytes version: not installed
- Safetensors version: 0.7.0
- xFormers version: not installed
- Accelerator: NVIDIA A100 80GB PCIe, 81920 MiB
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no
Who can help?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working