hub / github.com/Wan-Video/Wan2.1

github.com/Wan-Video/Wan2.1 @main sqlite

342 symbols 1,081 edges 36 files 74 documented · 22%

README

Wan2.1

<img src="https://github.com/Wan-Video/Wan2.1/raw/main/assets/logo.png" width="400"/>







💜 <a href="https://wan.video"><b>Wan</b></a> &nbsp&nbsp ｜ &nbsp&nbsp 🖥️ <a href="https://github.com/Wan-Video/Wan2.1">GitHub</a> &nbsp&nbsp  | &nbsp&nbsp🤗 <a href="https://huggingface.co/Wan-AI/">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/Wan-AI">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://arxiv.org/abs/2503.20314">Technical Report</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://wan.video/welcome?spm=a2ty_o02.30011076.0.0.6c9ee41eCcluqg">Blog</a> &nbsp&nbsp | &nbsp&nbsp💬 <a href="https://gw.alicdn.com/imgextra/i2/O1CN01tqjWFi1ByuyehkTSB_!!6000000000015-0-tps-611-1279.jpg">WeChat Group</a>&nbsp&nbsp | &nbsp&nbsp 📖 <a href="https://discord.gg/AKNgpMK4Yj">Discord</a>&nbsp&nbsp

Wan: Open and Advanced Large-Scale Video Generative Models

In this repository, we present Wan2.1, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. Wan2.1 offers these key features: - 👍 SOTA Performance: Wan2.1 consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks. - 👍 Supports Consumer-grade GPUs: The T2V-1.3B model requires only 8.19 GB VRAM, making it compatible with almost all consumer-grade GPUs. It can generate a 5-second 480P video on an RTX 4090 in about 4 minutes (without optimization techniques like quantization). Its performance is even comparable to some closed-source models. - 👍 Multiple Tasks: Wan2.1 excels in Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio, advancing the field of video generation. - 👍 Visual Text Generation: Wan2.1 is the first video model capable of generating both Chinese and English text, featuring robust text generation that enhances its practical applications. - 👍 Powerful Video VAE: Wan-VAE delivers exceptional efficiency and performance, encoding and decoding 1080P videos of any length while preserving temporal information, making it an ideal foundation for video and image generation.

Video Demos

🔥 Latest News!!

May 14, 2025: 👋 We introduce Wan2.1 VACE, an all-in-one model for video creation and editing, along with its inference code, weights, and technical report!
Apr 17, 2025: 👋 We introduce Wan2.1 FLF2V with its inference code and weights!
Mar 21, 2025: 👋 We are excited to announce the release of the Wan2.1 technical report. We welcome discussions and feedback!
Mar 3, 2025: 👋 Wan2.1's T2V and I2V have been integrated into Diffusers (T2V | I2V). Feel free to give it a try!
Feb 27, 2025: 👋 Wan2.1 has been integrated into ComfyUI. Enjoy!
Feb 25, 2025: 👋 We've released the inference code and weights of Wan2.1.

Community Works

If your work has improved Wan2.1 and you would like more people to see it, please inform us. - Helios, a breakthrough video generation model base on Wan2.1 that achieves minute-scale, high-quality video synthesis at 19.5 FPS on a single H100 GPU (about 10 FPS on a single Ascend NPU) —without relying on conventional long video anti-drifting strategies or standard video acceleration techniques. Visit their webpage for more details. - Video-As-Prompt, the first unified semantic-controlled video generation model based on Wan2.1-14B-I2V with a Mixture-of-Transformers architecture and in-context controls (e.g., concept, style, motion, camera). Refer to the project page for more examples. - LightX2V, a lightweight and efficient video generation framework that integrates Wan2.1 and Wan2.2, supports multiple engineering acceleration techniques for fast inference, which can run on RTX 5090 and RTX 4060 (8GB VRAM). - DriVerse, an autonomous driving world model based on Wan2.1-14B-I2V, generates future driving videos conditioned on any scene frame and given trajectory. Refer to the project page for more examples. - Training-Free-WAN-Editing, built on Wan2.1-T2V-1.3B, allows training-free video editing with image-based training-free methods, such as FlowEdit and FlowAlign. - Wan-Move, accepted to NeurIPS 2025, a framework that brings Wan2.1-I2V-14B to SOTA fine-grained, point-level motion control! Refer to their project page for more information. - EchoShot, a native multi-shot portrait video generation model based on Wan2.1-T2V-1.3B, allows generation of multiple video clips featuring the same character as well as highly flexible content controllability. Refer to their project page for more information. - AniCrafter, a human-centric animation model based on Wan2.1-14B-I2V, controls the Video Diffusion Models with 3DGS Avatars to insert and animate anyone into any scene following given motion sequences. Refer to the project page for more examples. - HyperMotion, a human image animation framework based on Wan2.1, addresses the challenge of generating complex human body motions in pose-guided animation. Refer to their website for more examples. - MagicTryOn, a video virtual try-on framework built upon Wan2.1-14B-I2V, addresses the limitations of existing models in expressing garment details and maintaining dynamic stability during human motion. Refer to their website for more examples. - ATI, built on Wan2.1-I2V-14B, is a trajectory-based motion-control framework that unifies object, local, and camera movements in video generation. Refer to their website for more examples. - Phantom has developed a unified video generation framework for single and multi-subject references based on both Wan2.1-T2V-1.3B and Wan2.1-T2V-14B. Please refer to their examples. - UniAnimate-DiT, based on Wan2.1-14B-I2V, has trained a Human image animation model and has open-sourced the inference and training code. Feel free to enjoy it! - CFG-Zero enhances Wan2.1 (covering both T2V and I2V models) from the perspective of CFG. - TeaCache now supports Wan2.1 acceleration, capable of increasing speed by approximately 2x. Feel free to give it a try! - DiffSynth-Studio provides more support for Wan2.1, including video-to-video, FP8 quantization, VRAM optimization, LoRA training, and more. Please refer to their examples.

📑 Todo List

Wan2.1 Text-to-Video
- [x] Multi-GPU Inference code of the 14B and 1.3B models
- [x] Checkpoints of the 14B and 1.3B models
- [x] Gradio demo
- [x] ComfyUI integration
- [x] Diffusers integration
- [ ] Diffusers + Multi-GPU Inference
Wan2.1 Image-to-Video
- [x] Multi-GPU Inference code of the 14B model
- [x] Checkpoints of the 14B model
- [x] Gradio demo
- [x] ComfyUI integration
- [x] Diffusers integration
- [ ] Diffusers + Multi-GPU Inference
Wan2.1 First-Last-Frame-to-Video
- [x] Multi-GPU Inference code of the 14B model
- [x] Checkpoints of the 14B model
- [x] Gradio demo
- [ ] ComfyUI integration
- [ ] Diffusers integration
- [ ] Diffusers + Multi-GPU Inference
Wan2.1 VACE
- [x] Multi-GPU Inference code of the 14B and 1.3B models
- [x] Checkpoints of the 14B and 1.3B models
- [x] Gradio demo
- [x] ComfyUI integration
- [ ] Diffusers integration
- [ ] Diffusers + Multi-GPU Inference

Quickstart

Installation

Clone the repo:

git clone https://github.com/Wan-Video/Wan2.1.git
cd Wan2.1

Install dependencies:

# Ensure torch >= 2.4.0
pip install -r requirements.txt

Model Download

Models	Download Link	Notes
T2V-14B	🤗 Huggingface 🤖 ModelScope	Supports both 480P and 720P
I2V-14B-720P	🤗 Huggingface 🤖 ModelScope	Supports 720P
I2V-14B-480P	🤗 Huggingface 🤖 ModelScope	Supports 480P
T2V-1.3B	🤗 Huggingface 🤖 ModelScope	Supports 480P
FLF2V-14B	🤗 Huggingface 🤖 ModelScope	Supports 720P
VACE-1.3B	🤗 Huggingface 🤖 ModelScope	Supports 480P
VACE-14B	🤗 Huggingface 🤖 ModelScope	Supports both 480P and 720P

💡Note: * The 1.3B model is capable of generating videos at 720P resolution. However, due to limited training at this resolution, the results are generally less stable compared to 480P. For optimal performance, we recommend using 480P resolution. * For the first-last frame to video generation, we train our model primarily on Chinese text-video pairs. Therefore, we recommend using Chinese prompt to achieve better results.

Download models using huggingface-cli:

pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.1-T2V-14B --local-dir ./Wan2.1-T2V-14B

Download models using modelscope-cli:

pip install modelscope
modelscope download Wan-AI/Wan2.1-T2V-14B --local_dir ./Wan2.1-T2V-14B

Run Text-to-Video Generation

This repository supports two Text-to-Video models (1.3B and 14B) and two resolutions (480P and 720P). The parameters and configurations for these models are as follows:

Task	Resolution	Model
480P	720P
t2v-14B	✔️	✔️	Wan2.1-T2V-14B
t2v-1.3B	✔️	❌	Wan2.1-T2V-1.3B

(1) Without Prompt Extension

To facilitate implementation, we will start with a basic version of the inference process that skips the prompt extension step.

Single-GPU inference

python generate.py  --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

If you encounter OOM (Out-of-Memory) issues, you can use the --offload_model True and --t5_cpu options to reduce GPU memory usage. For example, on an RTX 4090 GPU:

python generate.py  --task t2v-1.3B --size 832*480 --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

💡Note: If you are using the T2V-1.3B model, we recommend setting the parameter --sample_guide_scale 6. The --sample_shift parameter can be adjusted within the range of 8 to 12 based on the performance.

Multi-GPU inference using FSDP +

Core symbols most depended-on inside this repo

get

called by 19

gradio/vace.py

_sigma_to_alpha_sigma_t

called by 10

wan/utils/fm_solvers.py

set_timesteps

called by 8

wan/utils/fm_solvers.py

_sigma_to_alpha_sigma_t

called by 8

wan/utils/fm_solvers_unipc.py

wan/modules/attention.py

half

called by 6

wan/modules/attention.py

generate

called by 6

gradio/vace.py

Shape

Method 206

Function 74

Class 62

Languages

Python100%

Modules by API surface

wan/modules/vae.py39 symbols

wan/modules/t5.py37 symbols

wan/modules/model.py34 symbols

wan/modules/clip.py32 symbols

wan/utils/vace_processor.py22 symbols

wan/utils/fm_solvers.py22 symbols

wan/utils/fm_solvers_unipc.py19 symbols

wan/utils/prompt_extend.py17 symbols

wan/vace.py16 symbols

wan/utils/qwen_vl_utils.py13 symbols

wan/modules/xlm_roberta.py10 symbols

wan/modules/vace_model.py10 symbols

Dependencies from manifests, versioned

accelerate1.1.1 · 1×

dashscope1×

diffusers0.31.0 · 1×

easydict1×

flash_attn1×

ftfy1×

gradio5.0.0 · 1×

imageio1×

imageio-ffmpeg1×

numpy1.23.5 · 1×

opencv-python4.9.0.80 · 1×

tokenizers0.20.3 · 1×

For agents

$ claude mcp add Wan2.1 \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact