hub / github.com/Tencent-Hunyuan/HunyuanVideo-1.5

github.com/Tencent-Hunyuan/HunyuanVideo-1.5 @main

420 symbols 1,237 edges 39 files 137 documented · 33%

README

HunyuanVideo-1.5

🎬 HunyuanVideo-1.5: A leading lightweight video generation model

HunyuanVideo-1.5 is a video generation model that delivers top-tier quality with only 8.3B parameters, significantly lowering the barrier to usage. It runs smoothly on consumer-grade GPUs, making it accessible for every developer and creator. This repository provides the implementation and tools needed to generate creative videos.

👏 Join our <a href="https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/raw/main/assets/wechat.png" target="_blank">WeChat</a> and <a href="https://discord.gg/ehjWMqF5wY">Discord</a> |

💻 Official website Try our model!&nbsp&nbsp

🔥🔥🔥 News

🚀 Dec 23, 2025: Fp8 gemm inference is supported! 🔥🔥🔥🆕
🚀 Dec 05, 2025: New Release: We now release the 480p I2V step-distilled model, which generates videos in 8 or 12 steps (recommended)! On RTX 4090, end-to-end generation time is reduced by 75%, and a single RTX 4090 can generate videos within 75 seconds. The step-distilled model maintains comparable quality to the original model while achieving significant speedup. See Step Distillation Comparison for detailed quality comparisons. For even faster generation, you can also try 4 steps (faster speed with slightly reduced quality). To enable the step-distilled model, run generate.py with the --enable_step_distill parameter. See Usage for detailed usage instructions. 🔥🔥🔥🆕
📚 Dec 05, 2025: Training Code & LoRA Tuning Script Released: We now open-source the training code for HunyuanVideo-1.5! The training script (train.py) provides a full training pipeline with support for distributed training, FSDP, context parallel, gradient checkpointing, and more. HunyuanVideo-1.5 is trained using the Muon optimizer, which we have open-sourced in the Training section. If you would like to continue training our model or fine-tune it with LoRA, please use the Muon optimizer. See Training section for detailed usage instructions. 🔥🔥🔥🆕
🎉 Diffusers Support: HunyuanVideo-1.5 is now available on Hugging Face Diffusers! Check out Diffusers collection for easy integration. 🔥🔥🔥🆕
🚀 Nov 27, 2025: We now support cache inference (deepcache, teacache, taylorcache), achieving significant speedup! Pull the latest code to try it.
🚀 Nov 24, 2025: We now support deepcache inference.
👋 Nov 20, 2025: We release the inference code and model weights of HunyuanVideo-1.5.

🎥 Demo

🧩 Community Contributions

If you develop/use HunyuanVideo-1.5 in your projects, welcome to let us know.

Diffusers - HunyuanVideo-1.5 Diffusers: Official Hugging Face Diffusers integration for HunyuanVideo-1.5. Easily use HunyuanVideo-1.5 with the Diffusers library for seamless integration into your projects. See Usage with Diffusers section for details.
ComfyUI - ComfyUI: A powerful and modular diffusion model GUI with a graph/nodes interface. ComfyUI supports HunyuanVideo-1.5 with various engineering optimizations for fast inference. We provide a ComfyUI Usage Guide for HunyuanVideo-1.5.
Community-implemented ComfyUI Plugin - comfyui_hunyuanvideo_1.5_plugin: A community-implemented ComfyUI plugin for HunyuanVideo-1.5, offering both simplified and complete node sets for quick usage or deep workflow customization, with built-in automatic model download support.
LightX2V - LightX2V: A lightweight and efficient video generation framework that integrates HunyuanVideo-1.5, supporting multiple engineering acceleration techniques for fast inference.
Wan2GP v9.62 - Wan2GP: WanGP is a very low VRAM app (as low 6 GB of VRAM for Hunyuan Video 1.5) supports Lora Accelerator for a 8 steps generation and offers tools to facilitate Video Generation.
ComfyUI-MagCache - ComfyUI-MagCache: MagCache is a training-free caching approach that accelerates video generation by estimating fluctuating differences among model outputs across timesteps. It achieves 1.7x speedup for HunyuanVideo-1.5 with 20 inference steps.
OmniWeaving - OmniWeaving: An omni-level unified video generation model built upon HunyuanVideo-1.5, excelling in free-form multimodal composition and reasoning-augmented generation. Specifically, it seamlessly handles a diverse array of tasks, such as Text-to-Video, First-Frame-to-Video, Key-Frames-to-Video, Video-to-Video Editing, Reference-to-Video, Compositional Multi-Image-to-Video, and Text-Image-Video-to-Video.

📑 Open-source Plan

HunyuanVideo-1.5 (T2V/I2V)
[x] Inference Code and checkpoints
[x] ComfyUI Support
[x] LightX2V Support
[x] Diffusers Support
[ ] Release all model weights (Sparse attention, distill model, and SR models)

📋 Table of Contents

🔥🔥🔥 News
🎥 Demo
🧩 Community Contributions
📑 Open-source Plan
📖 Introduction
✨ Key Features
📜 System Requirements
🛠️ Dependencies and Installation
🧱 Download Pretrained Models
📝 Prompt Guide
🔑 Inference
Inference with Source Code
Usage with Diffusers
Prompt Enhancement
Text to Video
Image to Video
Command Line Arguments
Optimal Inference Configurations
🎓 Training
🎬 More Examples
📊 Evaluation
📚 Citation
🙏 Acknowledgements
🌟 Github Star History

📖 Introduction

We present HunyuanVideo-1.5, a lightweight yet powerful video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs. This achievement is built upon several key components, including meticulous data curation, an advanced DiT architecture with selective and sliding tile attention(SSTA), enhanced bilingual understanding through glyph-aware text encoding, progressive pre-training and post-training, and an efficient video super-resolution network. Leveraging these designs, we developed a unified framework capable of high-quality text-to-video and image-to-video generation across multiple durations and resolutions. Extensive experiments demonstrate that this compact and proficient model establishes a new state-of-the-art among open-source models. By releasing the code and weights of HunyuanVideo-1.5, we provide the community with a high-performance foundation that significantly lowers the cost of video creation and research, making advanced video generation more accessible to all.

✨ Key Features

Lightweight High-Performance Architecture: We propose an efficient architecture that integrates an 8.3B-parameter Diffusion Transformer (DiT) with a 3D causal VAE, achieving compression ratios of 16× in spatial dimensions and 4× along the temporal axis. Additionally, the innovative SSTA (Selective and Sliding Tile Attention) mechanism prunes redundant spatiotemporal kv blocks, significantly reducing computational overhead for long video sequences and accelerates inference, achieving an end-to-end speedup of $1.87 \times$ in 10-second 720p video synthesis compared to FlashAttention-3.

HunyuanVideo-1.5 DiT

Video Super-Resolution Enhancement: We develop an efficient few-step super-resolution network that upscales outputs to 1080p. It enhances sharpness while correcting distortions, thereby refining details and overall visual texture.

HunyuanVideo-1.5 VSR

End-to-End Training Optimization: This work employs a multi-stage, progressive training strategy covering the entire pipeline from pre-training to post-training. Combined with the Muon optimizer to accelerate convergence, this approach holistically refines motion coherence, aesthetic quality, and human preference alignment, achieving professional-grade content generation.

📜 System Requirements

Hardware Requirements

GPU: NVIDIA GPU with CUDA support
Minimum GPU Memory: 14 GB (with model offloading enabled)

Note: The memory requirements above are measured with model offloading enabled. If your GPU has sufficient memory, you may disable offloading for improved inference speed.

Software Requirements

Operating System: Linux
Python: Python 3.10 or higher
CUDA: Compatible CUDA version for your PyTorch installation

🛠️ Dependencies and Installation

Step 1: Clone the Repository

git clone https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5.git
cd HunyuanVideo-1.5

Step 2: Install Basic Dependencies

pip install -r requirements.txt
pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-sdk-python

Step 3: Install Attention Libraries

Flash Attention: Install Flash Attention for faster inference and reduced GPU memory consumption. Detailed installation instructions are available at Flash Attention.
Flex-Block-Attention: flex-block-attn is only required for sparse attention to achieve faster inference and can be installed by the following command: bash git clone https://github.com/Tencent-Hunyuan/flex-block-attn.git cd flex-block-attn git submodule update --init --recursive python3 setup.py install
SageAttention: To enable SageAttention for faster inference, you need to install it by the following command:

Note: Enabling SageAttention will automatically disable Flex-Block-Attention. bash git clone https://github.com/cooper1637/SageAttention.git cd SageAttention export EXT_PARALLEL=4 NVCC_APPEND_FLAGS="--threads 8" MAX_JOBS=32 # Optional python3 setup.py install
SGL-Kernel: To enable fp8 gemm for transformer, you need to install it by the following command: bash pip install sgl-kernel==0.3.18

🧱 Download Pretrained Models

💡 Distillation models and sparse attention models are still coming soon. Please stay tuned for the latest updates on the Hugging Face Model Card.

Download the pretrained models before generating videos. Detailed instructions are available at checkpoints-download.md.

Model Cards

ModelName	Download
HunyuanVideo-1.5-480P-T2V	480P-T2V
HunyuanVideo-1.5-480P-I2V	480P-I2V
HunyuanVideo-1.5-480P-T2V-cfg-distill	480P-T2V-cfg-distill
HunyuanVideo-1.5-480P-I2V-cfg-distill	480P-I2V-cfg-distill

Core symbols most depended-on inside this repo

get_parallel_state

called by 16

hyvideo/commons/parallel_states.py

device

called by 14

hyvideo/models/text_encoders/__init__.py

auto_offload_model

called by 14

hyvideo/commons/__init__.py

get_activation_layer

called by 13

hyvideo/models/transformers/modules/activation_layers.py

forward_with_checkpointing

called by 12

hyvideo/models/autoencoders/hunyuanvideo_15_vae.py

hyvideo/models/text_encoders/__init__.py

sync_tensor_for_sp

called by 7

train.py

Shape

Method 248

Function 108

Class 64

Languages

Python100%

Modules by API surface

hyvideo/models/autoencoders/hunyuanvideo_15_vae.py62 symbols

train.py45 symbols

hyvideo/pipelines/hunyuan_video_pipeline.py43 symbols

hyvideo/models/transformers/hunyuanvideo_1_5_transformer.py22 symbols

hyvideo/utils/communications.py18 symbols

hyvideo/schedulers/scheduling_flow_match_discrete.py16 symbols

hyvideo/models/transformers/modules/embed_layers.py16 symbols

hyvideo/utils/rewrite/clients.py15 symbols

hyvideo/models/text_encoders/__init__.py14 symbols

hyvideo/models/transformers/modules/mlp_layers.py12 symbols

hyvideo/commons/__init__.py12 symbols

hyvideo/models/vision_encoder/__init__.py11 symbols

Dependencies from manifests, versioned

angelslim0.2.2 · 1×

diffusers0.35.0 · 1×

einops0.8.0 · 1×

huggingface-hub0.34.0 · 1×

imageio2.37.0 · 1×

imageio-ffmpeg0.6.0 · 1×

loguru0.7.3 · 1×

numpy1.26.4 · 1×

omegaconf2.3.0 · 1×

openai2.8.0 · 1×

peft0.17.0 · 1×

pillow11.3.0 · 1×

For agents

$ claude mcp add HunyuanVideo-1.5 \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact