
This repository is the official implementation of Helios, which is a breakthrough video generation model that achieves minute-scale, high-quality video synthesis at 19.5 FPS on a single H100 GPU (about 10 FPS on a single Ascend NPU) —without relying on conventional long video anti-drifting strategies or standard video acceleration techniques.
Without commonly used anti-drifting strategies (e.g., self-forcing, error-banks, keyframe sampling, or inverted sampling), Helios generates minute-scale videos with high quality and strong coherence.
Without standard acceleration techniques (e.g., KV-cache, causal masking, sparse/linear attention, TinyVAE, progressive noise schedules, hidden-state caching, or quantization), Helios achieves 19.5 FPS in end-to-end inference on a single H100 GPU.
We introduce optimizations that improve both training and inference throughput while reducing memory consumption, enabling image-diffusion-scale batch sizes during training while fitting up to four 14B models within 80 GB of GPU memory.
or you can click here to get the video. Some best prompts are here.
[2026.03.26] 🔥 Add summary of FAQ, Tips, and Tutorals: https://github.com/PKU-YuanGroup/Helios/issues/47.[2026.03.24] 👋 A community-made, unofficial YouTube tutorial for Helios is available here. It covers installation on a consumer-grade PC and supports 4K video generation.[2026.03.20] 🚀 Helios now supports Ahead-of-Time Compilation (AOTI) on Spaces, with special thanks to the HuggingFace Team! Please refer to this Space for a usage example.[2026.03.20] 🔧 Based on issue #38, we've identified several ways to further improve Helios's performance, such as fixing the i2v train-inference inconsistency and fully enabling Easy Anti-Drifting. Please refer to commits and correct.yaml for details.[2026.03.12] ⚡️ Please note that real-time generation performance depends not only on the GPU, but also on the CPU, memory, CUDA driver version, etc. As tested by a user on better hardware with single H100, Helios can reach up to 20.89 FPS![2026.03.08] 🚀 Helios now fully supports Group Offloading and Context Parallelism! These features significantly optimize VRAM (only ~6GB) usage and enable inference across multiple GPUs with Ulysses Attention, Ring Attention, Unified Attention, and Ulysses Anything Attention.[2026.03.06] 👋 Cache-DiT now supports Helios, it offers Fully Cache Acceleration and Parallelism support for Helios! Special thanks to the Cache-DiT Team for their amazing work.[2026.03.06] 🔧 We fix the Parallel Inference logits for Helios, and provide an example here.[2026.03.06] 🚀 We official release the Gradio Demo, welcome to try it.[2026.03.05] 🔥 We are excited to announce the release of the Helios technical report on arXiv. We welcome discussions and feedback![2026.03.04] 👋 Day-0 support for Ascend-NPU,with sincere gratitude to the Ascend Team for their support.[2026.03.04] 👋 Day-0 support for Diffusers,with special thanks to the HuggingFace Team for their support.[2026.03.04] 👋 Day-0 support for SGLang-Diffusion,with huge thanks to the SGLang Team for their support.[2026.03.04] 👋 Day-0 support for vLLM-Omni,with heartfelt gratitude to the vLLM Team for their support.[2026.03.04] 🔥 We've released the training/inference code and weights of Helios-Base, Helios-Mid and Helios-Distilled.If your work has improved Helios and you would like more people to see it, please inform us.
If you prefer a step-by-step walkthrough, check out this community-made YouTube Tutorial. It covers local installation, 4K video generation, and how to run Helios on a consumer-grade PC, along with other practical usage tips.
# 0. Clone the repo
git clone --depth=1 https://github.com/PKU-YuanGroup/Helios.git
cd Helios
# 1. Create conda environment
conda create -n helios python=3.11.2
conda activate helios
# 2. Install PyTorch (adjust for your CUDA version)
# CUDA 12.6
pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu126
# CUDA 12.8
pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu128
# CUDA 13.0
pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu130
# 3. Install dependencies
bash install.sh
| Models | Download Link | Supports | Notes |
|---|---|---|---|
| Helios-Base | 🤗 Huggingface 🤖 ModelScope | T2V ✅ I2V ✅ V2V ✅ Interactive ✅ | Best Quality, with v-prediction, standard CFG and custom HeliosScheduler. |
| Helios-Mid | 🤗 Huggingface 🤖 ModelScope | T2V ✅ I2V ✅ V2V ✅ Interactive ✅ | Intermediate Ckpt, with v-prediction, CFG-Zero* and custom HeliosScheduler. |
| Helios-Distilled | 🤗 Huggingface 🤖 ModelScope | T2V ✅ I2V ✅ V2V ✅ Interactive ✅ | Best Efficiency, with x0-prediction and custom HeliosDMDScheduler. |
💡Note: * All three models share the same architecture, but Helios-Mid and Helios-Distilled use a more aggressive multi-scale sampling pipeline to achieve better efficiency. * Helios-Mid is an intermediate checkpoint generated in the process of distilling Helios-Base into Helios-Distilled, and may not meet expected quality. * For Image-to-Video or Video-to-Video, since training is based on Text-to-Video, these two functions may be slightly inferior to Text-to-Video. You may enable
is_skip_first_chunkif you find the first few chunks are static or imporve the value ofimage_noise_sigma_min,image_noise_sigma_max,video_noise_sigma_min, andvideo_noise_sigma_max.
Download models using huggingface-cli:
pip install "huggingface_hub[cli]"
huggingface-cli download BestWishYSH/Helios-Base --local-dir BestWishYSH/Helios-Base
huggingface-cli download BestWishYSH/Helios-Mid --local-dir BestWishYSH/Helios-Mid
huggingface-cli download BestWishYSH/Helios-Distilled --local-dir BestWishYSH/Helios-Distilled
Download models using modelscope-cli:
pip install modelscope
modelscope download BestWishYSH/Helios-Base --local_dir BestWishYSH/Helios-Base
modelscope download BestWishYSH/Helios-Mid --local_dir BestWishYSH/Helios-Mid
modelscope download BestWishYSH/Helios-Distilled --local_dir BestWishYSH/Helios-Distilled
Helios uses an autoregressive approach that generates 33 frames per chunk. For optimal performance, num_frames should be set to a multiple of 33. If a non-multiple value is provided, it will be automatically rounded up to the nearest multiple of 33.
Example frame counts for different video lengths:
| num_frames | Adjusted Frames | 24 FPS | 16 FPS |
|---|---|---|---|
| 1449 | 1452 (33×44) | ~60s (1min) | ~90s (1min 30s) |
| 720 | 726 (33×22) | ~30s | ~45s |
| 240 | 264 (33×8) | ~11s | ~16s |
| 129 | 132 (33×4) | ~5.5s | ~8s |
| 81 | 99 (33×3) | ~4s | ~6s |
We provide inference scripts for all models covering text-to-video, image-to-video, and video-to-video in this directory.
cd scripts/inference
# For Helios-Base
bash helios-base_t2v.sh
bash helios-base_i2v.sh
bash helios-base_v2v.sh
# For Helios-Mid
bash helios-mid_t2v.sh
bash helios-mid_i2v.sh
bash helios-mid_v2v.sh
# For Helios-Distilled
bash helios-distilled_t2v.sh
bash helios-distilled_i2v.sh
bash helios-distilled_v2v.sh
# For Interactive
# ⚠️ This feature is still under development — results may not always meet expectations
cd scripts/inference/experiment_interactive
Before trying your own inputs, we highly recommend going through the sanity check to find out if any hardware or software went wrong.
| Task | Helios-Base | Helios-Mid | Helios-Distilled |
|---|---|---|---|
$ claude mcp add Helios \
-- python -m otcore.mcp_server <graph>