
HunyuanVideo-1.5 is a video generation model that delivers top-tier quality with only 8.3B parameters, significantly lowering the barrier to usage. It runs smoothly on consumer-grade GPUs, making it accessible for every developer and creator. This repository provides the implementation and tools needed to generate creative videos.
👏 Join our <a href="https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/raw/main/assets/wechat.png" target="_blank">WeChat</a> and <a href="https://discord.gg/ehjWMqF5wY">Discord</a> |
💻 Official website Try our model!  
generate.py with the --enable_step_distill parameter. See Usage for detailed usage instructions. 🔥🔥🔥🆕train.py) provides a full training pipeline with support for distributed training, FSDP, context parallel, gradient checkpointing, and more. HunyuanVideo-1.5 is trained using the Muon optimizer, which we have open-sourced in the Training section. If you would like to continue training our model or fine-tune it with LoRA, please use the Muon optimizer. See Training section for detailed usage instructions. 🔥🔥🔥🆕If you develop/use HunyuanVideo-1.5 in your projects, welcome to let us know.
Diffusers - HunyuanVideo-1.5 Diffusers: Official Hugging Face Diffusers integration for HunyuanVideo-1.5. Easily use HunyuanVideo-1.5 with the Diffusers library for seamless integration into your projects. See Usage with Diffusers section for details.
ComfyUI - ComfyUI: A powerful and modular diffusion model GUI with a graph/nodes interface. ComfyUI supports HunyuanVideo-1.5 with various engineering optimizations for fast inference. We provide a ComfyUI Usage Guide for HunyuanVideo-1.5.
Community-implemented ComfyUI Plugin - comfyui_hunyuanvideo_1.5_plugin: A community-implemented ComfyUI plugin for HunyuanVideo-1.5, offering both simplified and complete node sets for quick usage or deep workflow customization, with built-in automatic model download support.
LightX2V - LightX2V: A lightweight and efficient video generation framework that integrates HunyuanVideo-1.5, supporting multiple engineering acceleration techniques for fast inference.
Wan2GP v9.62 - Wan2GP: WanGP is a very low VRAM app (as low 6 GB of VRAM for Hunyuan Video 1.5) supports Lora Accelerator for a 8 steps generation and offers tools to facilitate Video Generation.
ComfyUI-MagCache - ComfyUI-MagCache: MagCache is a training-free caching approach that accelerates video generation by estimating fluctuating differences among model outputs across timesteps. It achieves 1.7x speedup for HunyuanVideo-1.5 with 20 inference steps.
OmniWeaving - OmniWeaving: An omni-level unified video generation model built upon HunyuanVideo-1.5, excelling in free-form multimodal composition and reasoning-augmented generation. Specifically, it seamlessly handles a diverse array of tasks, such as Text-to-Video, First-Frame-to-Video, Key-Frames-to-Video, Video-to-Video Editing, Reference-to-Video, Compositional Multi-Image-to-Video, and Text-Image-Video-to-Video.
We present HunyuanVideo-1.5, a lightweight yet powerful video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs. This achievement is built upon several key components, including meticulous data curation, an advanced DiT architecture with selective and sliding tile attention(SSTA), enhanced bilingual understanding through glyph-aware text encoding, progressive pre-training and post-training, and an efficient video super-resolution network. Leveraging these designs, we developed a unified framework capable of high-quality text-to-video and image-to-video generation across multiple durations and resolutions. Extensive experiments demonstrate that this compact and proficient model establishes a new state-of-the-art among open-source models. By releasing the code and weights of HunyuanVideo-1.5, we provide the community with a high-performance foundation that significantly lowers the cost of video creation and research, making advanced video generation more accessible to all.


Note: The memory requirements above are measured with model offloading enabled. If your GPU has sufficient memory, you may disable offloading for improved inference speed.
git clone https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5.git
cd HunyuanVideo-1.5
pip install -r requirements.txt
pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-sdk-python
Flash Attention: Install Flash Attention for faster inference and reduced GPU memory consumption. Detailed installation instructions are available at Flash Attention.
Flex-Block-Attention:
flex-block-attn is only required for sparse attention to achieve faster inference and can be installed by the following command:
bash
git clone https://github.com/Tencent-Hunyuan/flex-block-attn.git
cd flex-block-attn
git submodule update --init --recursive
python3 setup.py install
SageAttention: To enable SageAttention for faster inference, you need to install it by the following command:
Note: Enabling SageAttention will automatically disable Flex-Block-Attention.
bash git clone https://github.com/cooper1637/SageAttention.git cd SageAttention export EXT_PARALLEL=4 NVCC_APPEND_FLAGS="--threads 8" MAX_JOBS=32 # Optional python3 setup.py install
SGL-Kernel:
To enable fp8 gemm for transformer, you need to install it by the following command:
bash
pip install sgl-kernel==0.3.18
💡 Distillation models and sparse attention models are still coming soon. Please stay tuned for the latest updates on the Hugging Face Model Card.
Download the pretrained models before generating videos. Detailed instructions are available at checkpoints-download.md.
| ModelName | Download |
|---|---|
| HunyuanVideo-1.5-480P-T2V | 480P-T2V |
| HunyuanVideo-1.5-480P-I2V | 480P-I2V |
| HunyuanVideo-1.5-480P-T2V-cfg-distill | 480P-T2V-cfg-distill |
| HunyuanVideo-1.5-480P-I2V-cfg-distill | 480P-I2V-cfg-distill |
$ claude mcp add HunyuanVideo-1.5 \
-- python -m otcore.mcp_server <graph>