The official PyTorch implementation of the paper "Human Motion Diffusion Model".
Please visit our webpage for more details.

(1) We released the 50 diffusion steps model (instead of 1000 steps) which runs 20X faster with comparable results.
(2) Calling CLIP just once and caching the result runs 2X faster for all models. Please pull.
Performance improvement is due to an evaluation bug fix. BLUE marks fixed entries compared to the paper.


If you find this code useful in your research, please cite:
MDM:
@inproceedings{
tevet2023human,
title={Human Motion Diffusion Model},
author={Guy Tevet and Sigal Raab and Brian Gordon and Yoni Shafir and Daniel Cohen-or and Amit Haim Bermano},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=SJ1kSyO2jwu}
}
DiP and CLoSD:
@article{tevet2024closd,
title={CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control},
author={Tevet, Guy and Raab, Sigal and Cohan, Setareh and Reda, Daniele and Luo, Zhengyi and Peng, Xue Bin and Bermano, Amit H and van de Panne, Michiel},
journal={arXiv preprint arXiv:2410.03441},
year={2024}
}
📢 12/Feb/25 - Added many things:
* The DiP model
* MDM with DistilBERT text encoder (Add --text_encoder_type bert)
* Developed by the legendary Roy Kapon!
* --gen_during_training feature.
* --mask_frames bug fix.
* --use_ema Weight averaging using Exponential Moving Average.
* Dataset caching for faster loading (by default).
* eval_humanml script can be logged with WanDB.
📢 29/Jan/25 - Added WandB support with --train_platform_type WandBPlatform.
📢 15/Apr/24 - Released a 50 diffusion steps model (instead of 1000 steps) which runs 20X faster 🤩🤩🤩 with comparable results.
📢 12/Apr/24 - MDM inference is now 2X faster 🤩🤩🤩 This was made possible by calling CLIP just once and caching the result, and is backward compatible with older models.
📢 25/Jan/24 - Fixed bug in evalutation code (#182) - Please use the fixed results when citing MDM.
📢 1/Jun/23 - Fixed generation issue (#104) - Please pull to improve generation results.
📢 23/Nov/22 - Fixed evaluation issue (#42) - Please pull and run bash prepare/download_t2m_evaluators.sh from the top of the repo to adapt.
📢 4/Nov/22 - Added sampling, training and evaluation of unconstrained tasks.
Note slight env changes adapting to the new code. If you already have an installed environment, run bash prepare/download_unconstrained_assets.sh; conda install -y -c anaconda scikit-learn to adapt.
📢 3/Nov/22 - Added in-between and upper-body editing.
📢 31/Oct/22 - Added sampling, training and evaluation of action-to-motion tasks.
📢 9/Oct/22 - Added training and evaluation scripts.
Note slight env changes adapting to the new code. If you already have an installed environment, run bash prepare/download_glove.sh; pip install clearml to adapt.
📢 6/Oct/22 - First release - sampling and rendering using pre-trained models.
🐔 LoRA-MDM - Promptly adapt MDM for stylized text-to-motion.
🦩 AnyTop - Character Animation Diffusion with Any Topology.
🥋 CLoSD - Real-time MDM controls the character in a physical simulation.
🐉 SinMDM - Learns single motion motifs - even for non-humanoid characters.
👯 PriorMDM - Uses MDM as a generative prior, enabling new generation tasks with few examples or even no data at all.
💃 MAS - Generating intricate 3D motions (including non-humanoid) using 2D diffusion models trained on in-the-wild videos.
🐒 MoMo - Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer
🏃 CAMDM - Taming Diffusion Probabilistic Models for Character Control - a real-time version of MDM.
This code was tested on Ubuntu 18.04.5 LTS and requires:
Install ffmpeg (if not already installed):
sudo apt update
sudo apt install ffmpeg
For windows use this instead.
Setup conda env:
conda env create -f environment.yml
conda activate mdm
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git
Download dependencies:
Text to Motion
bash prepare/download_smpl_files.sh
bash prepare/download_glove.sh
bash prepare/download_t2m_evaluators.sh
Action to Motion
bash prepare/download_smpl_files.sh
bash prepare/download_recognition_models.sh
Unconstrained
bash prepare/download_smpl_files.sh
bash prepare/download_recognition_models.sh
bash prepare/download_recognition_unconstrained_models.sh
Text to Motion
Or, alternatively, parse the data yourself according to the original instructions:
Original Text to Motion instructions
There are two paths to get the data:
(a) Go the easy way if you just want to generate text-to-motion (excluding editing which does require motion capture data)
(b) Get full data to train and evaluate the model.
HumanML3D - Clone HumanML3D, then copy the data dir to our repository:
cd ..
git clone https://github.com/EricGuo5513/HumanML3D.git
unzip ./HumanML3D/HumanML3D/texts.zip -d ./HumanML3D/HumanML3D/
cp -r HumanML3D/HumanML3D motion-diffusion-model/dataset/HumanML3D
cd motion-diffusion-model
HumanML3D - Follow the instructions in HumanML3D, then copy the result dataset to our repository:
cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D
KIT - Download from HumanML3D (no processing needed this time) and the place result in ./dataset/KIT-ML
Action to Motion
UESTC, HumanAct12
bash prepare/download_a2m_datasets.sh
Unconstrained
HumanAct12
bash prepare/download_unconstrained_datasets.sh
Download the model(s) you wish to use, then unzip and place them in ./save/.
Text to Motion
You need only the first one.
HumanML3D
[NEW!] humanml_trans_dec_512_bert-50steps - Runs 20X faster with improved precision!
[NEW!] humanml-encoder-512-50steps - Runs 20X faster with comparable performance!
humanml-encoder-512 (best model used in the paper)
KIT
Action to Motion
UESTC
HumanAct12
Unconstrained
HumanAct12
Text to Motion
python -m sample.generate --model_path ./save/humanml_trans_enc_512/model000200000.pt --num_samples 10 --num_repetitions 3
python -m sample.generate --model_path ./save/humanml_trans_enc_512/model000200000.pt --input_text ./assets/example_text_prompts.txt
python -m sample.generate --model_path ./save/humanml_trans_enc_512/model000200000.pt --text_prompt "the person walked forward and is picking up his toolbox."
Action to Motion
python -m sample.generate --model_path ./save/humanact12/model000350000.pt --num_samples 10 --num_repetitions 3
python -m sample.generate --model_path ./save/humanact12/model000350000.pt --action_file ./assets/example_action_names_humanact12.txt
python -m sample.generate --model_path ./save/humanact12/model000350000.pt --action_name "drink"
Unconstrained
python -m sample.generate --model_path ./save/unconstrained/model000450000.pt --num_samples 10 --num_repetitions 3
By abuse of notation, (num_samples * num_repetitions) samples are created, and are visually organized in a display of num_samples rows and num_repetitions columns.
You may also define:
* --device id.
* --seed to sample different prompts.
* --motion_length (text-to-motion only) in seconds (maximum is 9.8[sec]).
Running those will get you:
results.npy file with text prompts and xyz positions of the generated animationsample##_rep##.mp4 - a stick figure animation for each generated motion.It will look something like this:

You can stop here, or render the SMPL mesh using the following script.
To create SMPL mesh per frame run:
python -m visualize.render_mesh --input_path /path/to/mp4/stick/figure/file
This script outputs:
* sample##_rep##_smpl_params.npy - SMPL parameters (thetas, root translations, vertices and faces)
* sample##_rep##_obj - Mesh per frame in .obj format.
Notes:
* The .obj can be integrated into Blender/Maya/3DS-MAX and rendered using them.
* This script is running SMPLify and needs GPU as well (can be specified with the --device flag).
* Important - Do not change the original .mp4 path before running the script.
Notes for 3d makers:
* You have two ways to animate the sequence:
1. Use the SMPL add-on and the theta parameters saved to sample##_rep##_smpl_params.npy (we always use beta=0 and the gender-neutral model).
1. A more straightforward way is using the mesh data itself. All meshes have the same topology (SMPL), so you just need to keyframe vertex locations.
Since the OBJs are not preserving vertices order, we also save this data to the sample##_rep##_smpl_params.npy file for your convenience.
in_between and upper_body.python -m sample.edit --model_path ./save/humanml_trans_enc_512/model000200000.pt --edit_mode in_between
You may also define:
* --num_samples (default is 10) / --num_repetitions (default is 3).
* --device id.
* --seed to sample different promp
$ claude mcp add motion-diffusion-model \
-- python -m otcore.mcp_server <graph>