<a href='https://github.com/xumingw' target='_blank'>Mingwang Xu</a><sup>1*</sup> 
<a href='https://github.com/crystallee-ai' target='_blank'>Hui Li</a><sup>1*</sup> 
<a href='https://github.com/subazinga' target='_blank'>Qingkun Su</a><sup>1*</sup> 
<a href='https://github.com/NinoNeumann' target='_blank'>Hanlin Shang</a><sup>1</sup> 
<a href='https://github.com/AricGamma' target='_blank'>Liwei Zhang</a><sup>1</sup> 
<a href='https://github.com/cnexah' target='_blank'>Ce Liu</a><sup>3</sup> 
<a href='https://jingdongwang2017.github.io/' target='_blank'>Jingdong Wang</a><sup>2</sup> 
<a href='https://yoyo000.github.io/' target='_blank'>Yao Yao</a><sup>4</sup> 
<a href='https://sites.google.com/site/zhusiyucs/home' target='_blank'>Siyu Zhu</a><sup>1</sup> 
<sup>1</sup>Fudan University  <sup>2</sup>Baidu Inc  <sup>3</sup>ETH Zurich  <sup>4</sup>Nanjing University
<a href='https://github.com/fudan-generative-vision/hallo'><img src='https://img.shields.io/github/stars/fudan-generative-vision/hallo?style=social'></a>
<a href='https://fudan-generative-vision.github.io/hallo/#/'><img src='https://img.shields.io/badge/Project-HomePage-Green'></a>
<a href='https://arxiv.org/pdf/2406.08801'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
<a href='https://huggingface.co/spaces/fudan-generative-ai/hallo'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a>
<a href='https://huggingface.co/fudan-generative-ai/hallo'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Demo-yellow'></a>
<a href='https://www.modelscope.cn/models/fudan-generative-vision/Hallo/summary'><img src='https://img.shields.io/badge/Modelscope-Model-purple'></a>
<a href='assets/wechat.jpeg'><img src='https://badges.aleen42.com/src/wechat.svg'></a>
https://github.com/fudan-generative-vision/hallo/assets/17402682/9d1a0de4-3470-4d38-9e4f-412f517f834c
| Devil Wears Prada | Green Book | Infernal Affairs |
![]() |
![]() |
![]() |
| Patch Adams | Tough Love | Shawshank Redemption |
![]() |
![]() |
![]() |
Explore more examples.
2024/06/28: 🎉🎉🎉 We are proud to announce the release of our model training code. Try your own training data. Here is tutorial.2024/06/21: 🚀🚀🚀 Cloned a Gradio demo on 🤗Huggingface space.2024/06/20: 🌟🌟🌟 Received numerous contributions from the community, including a Windows version, ComfyUI, WebUI, and Docker template.2024/06/15: ✨✨✨ Released some images and audios for inference testing on 🤗Huggingface.2024/06/15: 🎉🎉🎉 Launched the first version on 🫡GitHub.Explore the resources developed by our community to enhance your experience with Hallo:
Thanks to all of them.
Join our community and explore these amazing resources to make the most out of Hallo. Enjoy and elevate their creative projects!

Create conda environment:
conda create -n hallo python=3.10
conda activate hallo
Install packages with pip
pip install -r requirements.txt
pip install .
Besides, ffmpeg is also needed:
apt-get install ffmpeg
The entry point for inference is scripts/inference.py. Before testing your cases, two preparations need to be completed:
You can easily get all pretrained models required by inference from our HuggingFace repo.
Clone the pretrained models into ${PROJECT_ROOT}/pretrained_models directory by cmd below:
git lfs install
git clone https://huggingface.co/fudan-generative-ai/hallo pretrained_models
Or you can download them separately from their source repo:
pretrained_models/face_analysis/models/. (Thanks to deepinsight)pretrained_models/face_analysis/models.Finally, these pretrained models should be organized as follows:
./pretrained_models/
|-- audio_separator/
| |-- download_checks.json
| |-- mdx_model_data.json
| |-- vr_model_data.json
| `-- Kim_Vocal_2.onnx
|-- face_analysis/
| `-- models/
| |-- face_landmarker_v2_with_blendshapes.task # face landmarker model from mediapipe
| |-- 1k3d68.onnx
| |-- 2d106det.onnx
| |-- genderage.onnx
| |-- glintr100.onnx
| `-- scrfd_10g_bnkps.onnx
|-- motion_module/
| `-- mm_sd_v15_v2.ckpt
|-- sd-vae-ft-mse/
| |-- config.json
| `-- diffusion_pytorch_model.safetensors
|-- stable-diffusion-v1-5/
| `-- unet/
| |-- config.json
| `-- diffusion_pytorch_model.safetensors
`-- wav2vec/
`-- wav2vec2-base-960h/
|-- config.json
|-- feature_extractor_config.json
|-- model.safetensors
|-- preprocessor_config.json
|-- special_tokens_map.json
|-- tokenizer_config.json
`-- vocab.json
Hallo has a few simple requirements for input data:
For the source image:
For the driving audio:
We have provided some samples for your reference.
Simply to run the scripts/inference.py and pass source_image and driving_audio as input:
python scripts/inference.py --source_image examples/reference_images/1.jpg --driving_audio examples/driving_audios/1.wav
Animation results will be saved as ${PROJECT_ROOT}/.cache/output.mp4 by default. You can pass --output to specify the output file name. You can find more examples for inference at examples folder.
For more options:
usage: inference.py [-h] [-c CONFIG] [--source_image SOURCE_IMAGE] [--driving_audio DRIVING_AUDIO] [--output OUTPUT] [--pose_weight POSE_WEIGHT]
[--face_weight FACE_WEIGHT] [--lip_weight LIP_WEIGHT] [--face_expand_ratio FACE_EXPAND_RATIO]
options:
-h, --help show this help message and exit
-c CONFIG, --config CONFIG
--source_image SOURCE_IMAGE
source image
--driving_audio DRIVING_AUDIO
driving audio
--output OUTPUT output video file name
--pose_weight POSE_WEIGHT
weight of pose
--face_weight FACE_WEIGHT
weight of face
--lip_weight LIP_WEIGHT
weight of lip
--face_expand_ratio FACE_EXPAND_RATIO
face region
The training data, which utilizes some talking-face videos similar to the source images used for inference, also needs to meet the following requirements:
Organize your raw videos into the following directory structure:
dataset_name/
|-- videos/
| |-- 0001.mp4
| |-- 0002.mp4
| |-- 0003.mp4
| `-- 0004.mp4
You can use any dataset_name, but ensure the videos directory is named as shown above.
Next, process the videos with the following commands:
python -m scripts.data_preprocess --input_dir dataset_name/videos --step 1
python -m scripts.data_preprocess --input_dir dataset_name/videos --step 2
Note: Execute steps 1 and 2 sequentially as they perform different tasks. Step 1 converts videos into frames, extracts audio from each video, and generates the necessary masks. Step 2 generates face embeddings using InsightFace and audio embeddings using Wav2Vec, and requires a GPU. For parallel processing, use the -p and -r arguments. The -p argument specifies the total number of instances to launch, dividing the data into p parts. The -r argument specifies which part the current process should handle. You need to manually launch multiple instances with different values for -r.
Generate the metadata JSON files with the following commands:
python scripts/extract_meta_info_stage1.py -r path/to/dataset -n dataset_name
python scripts/extract_meta_info_stage2.py -r path/to/dataset -n dataset_name
Replace `path/to/dataset
$ claude mcp add hallo \
-- python -m otcore.mcp_server <graph>