MCPcopy
hub / github.com/OpenTalker/video-retalking

github.com/OpenTalker/video-retalking @v0.0.1 sqlite

repository ↗ · DeepWiki ↗ · release v0.0.1 ↗
1,363 symbols 3,526 edges 167 files 271 documented · 20%
README

VideoReTalking Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

     

<a target='_blank'>Kun Cheng <sup>*,1,2</sup> </a>&emsp;
<a href='https://vinthony.github.io/' target='_blank'>Xiaodong Cun <sup>*,2</a>&emsp;
<a href='https://yzhang2016.github.io/yongnorriszhang.github.io/' target='_blank'>Yong Zhang <sup>2</sup></a>&emsp;
<a href='https://menghanxia.github.io/' target='_blank'>Menghan Xia <sup>2</sup></a>&emsp;
<a href='https://feiiyin.github.io/' target='_blank'>Fei Yin <sup>2,3</sup></a>&emsp;


<a target='_blank'>Mingrui Zhu <sup>1</sup></a>&emsp;
<a href='https://xuanwangvc.github.io/' target='_blank'>Xuan Wang <sup>2</sup></a>&emsp;
<a href='https://juewang725.github.io/' target='_blank'>Jue Wang <sup>2</sup></a>&emsp;
<a href='https://web.xidian.edu.cn/nnwang/en/index.html' target='_blank'>Nannan Wang <sup>1</sup></a>









<sup>1</sup> Xidian University &emsp; <sup>2</sup> Tencent AI Lab &emsp; <sup>3</sup> Tsinghua University

SIGGRAPH Asia 2022 Conferenence Track

We present VideoReTalking, a new system to edit the faces of a real-world talking head video according to input audio, producing a high-quality and lip-syncing output video even with a different emotion. Our system disentangles this objective into three sequential tasks: (1) face video generation with a canonical expression; (2) audio-driven lip-sync; and (3) face enhancement for improving photo-realism. Given a talking-head video, we first modify the expression of each frame according to the same expression template using the expression editing network, resulting in a video with the canonical expression. This video, together with the given audio, is then fed into the lip-sync network to generate a lip-syncing video. Finally, we improve the photo-realism of the synthesized faces through an identity-aware face enhancement network and post-processing. We use learning-based approaches for all three steps and all our modules can be tackled in a sequential pipeline without any user intervention.

pipeline

Pipeline

Results in the Wild (contains audio)

https://user-images.githubusercontent.com/4397546/224310754-665eb2dd-aadc-47dc-b1f9-2029a937b20a.mp4

Environment

git clone https://github.com/vinthony/video-retalking.git
cd video-retalking
conda create -n video_retalking python=3.8
conda activate video_retalking
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt

Quick Inference

Pretrained Models

Please download our pre-trained models and put them in ./checkpoints.

Inference

python3 inference.py \
  --face examples/face/1.mp4 \
  --audio examples/audio/1.wav \
  --outfile results/1_1.mp4

This script includes data preprocessing steps. You can test any talking face videos without manual alignment. But it is worth noting that DNet cannot handle extreme poses.

You can also control the expression by adding the following parameters:

--exp_img: Pre-defined expression template. The default is "neutral". You can choose "smile" or an image path.

--up_face: You can choose "surprise" or "angry" to modify the expression of upper face with GANimation.

Citation

If you find our work useful in your research, please consider citing:

@misc{cheng2022videoretalking,
        title={VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild}, 
        author={Kun Cheng and Xiaodong Cun and Yong Zhang and Menghan Xia and Fei Yin and Mingrui Zhu and Xuan Wang and Jue Wang and Nannan Wang},
        year={2022},
        eprint={2211.14758},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
  }

Acknowledgement

Thanks to Wav2Lip, PIRenderer, GFP-GAN, GPEN, ganimation_replicate, STIT for sharing their code.

Related Work

Disclaimer

This is not an official product of Tencent. This repository can only be used for personal/research/non-commercial purposes.

Core symbols most depended-on inside this repo

to
called by 90
third_part/face3d/models/bfm.py
get
called by 58
third_part/face3d/models/arcface_torch/eval_ijbc.py
split
called by 57
third_part/face3d/models/arcface_torch/eval/verification.py
defineProperties
called by 26
docs/static/js/bulma-carousel.js
eval
called by 24
third_part/face3d/models/base_model.py
save
called by 16
third_part/face3d/util/html.py
write
called by 15
third_part/GPEN/face_morpher/facemorpher/videoer.py
spectral_norm
called by 14
models/base_blocks.py

Shape

Method 696
Function 442
Class 225

Languages

Python90%
TypeScript10%

Modules by API surface

models/base_blocks.py75 symbols
docs/static/js/fontawesome.all.min.js70 symbols
third_part/GPEN/face_model/gpen_model.py57 symbols
docs/static/js/bulma-carousel.js50 symbols
third_part/ganimation_replicate/model/model_utils.py43 symbols
third_part/GFPGAN/gfpgan/archs/stylegan2_bilinear_arch.py37 symbols
third_part/face3d/models/networks.py35 symbols
third_part/GPEN/face_parse/model.py33 symbols
third_part/face3d/models/base_model.py26 symbols
third_part/face3d/models/arcface_torch/backbones/mobilefacenet.py22 symbols
models/transformer.py22 symbols
third_part/GFPGAN/gfpgan/archs/stylegan2_clean_arch.py21 symbols

Dependencies from manifests, versioned

basicsr1.4.2 · 1×
dlib19.24.0 · 1×
einops0.4.1 · 1×
face-alignment1.3.5 · 1×
facexlib0.2.5 · 1×
kornia0.5.1 · 1×
librosa0.9.2 · 1×
ninja1.10.2.3 · 1×

For agents

$ claude mcp add video-retalking \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact