Unimotion: Unifying 3D Human Motion Synthesis and Understanding
Chuqiao Li, Julian Chibane, Yannan He, Naama Pearl, Andreas Geiger, Gerard Pons-Moll
[Project Page] [Paper]
3DV(Oral), 2025
- [2024/09/30] Unimotion paper is available on ArXiv.
- [2025/13/04] Code and pre-trained released.
- Alignment between frame-level text and motion enables the temproal semantic awareness of the motion generation!
- Separate diffusion process for aligned motion and text enables multi-directional inference!
- Our model allows Multiple Novel Applications:
- Hierarchical Control: Allowing users to specify motion at different levels of detail
- Motion Text Generation: Obtaining motion text descriptions for existing MoCap data or YouTube videos
- Motion Editing: Allowing for editability, generating motion from text, and editing the motion via text edits
Install ffmpeg (if not already installed):
sudo apt update
sudo apt install ffmpegFor windows use this instead.
Setup conda env:
conda env create -f environment.yml
conda activate unimotion
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.gitDownload dependencies:
bash prepare/download_smpl_files.sh
bash prepare/download_glove.sh
bash prepare/download_t2m_evaluators.shDownload the data:
HumanML3D (Sequence-level motion and text) - Follow the instructions in HumanML3D, then run the following command:
cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3DBABEL Frame-level text Embeddings
You can download the preprocessed CLIP text embeddings (derived from BABEL annotations) with:
bash prepare/download_clip_embeddings.shThese processed embeddings are all you need for training, sampling, and evaluation.
If you'd like to inspect the ground-truth frame-level motion-text alignments yourself, please refer to the instructions in this repo to download text labels and unify annotations accross different datasets.
Directory Structure
After running the download scripts, your directory structure should look like this:
Unimotion/
├── dataset/
└── HumanML3D/
├── clip_encoder.py
├── clip_enc_single/
├── examples_editing.txt
├── Mean_seg_pca_51.npy
├── pca/
├── README.md
├── Std_seg_pca_51.npy
├── test_ft.txt
├── test_ft_no_overlap.txt
├── texts/
├── train_ft.txt
├── val_ft.txt
└── val_ft_no_overlap.txt
Download the model then unzip and place them in ./save/.
bash prepare/download_checkpoints.shFrame-Level Text to Motion
python -m sample.generate \
--model_path ./save/unimotion_pca_51_humanml_trans_enc_512/model000400000.pt \
--sample_condition t2m \
--input_gt_local_txt ./assets/walk_sit.csv \
--guidance_param 0python -m sample.generate \
--model_path ./save/unimotion_pca_51_humanml_trans_enc_512/model000400000.pt \
--sample_condition t2m \
--num_samples 10 \
--guidance_param 0Hierarchical Text to Motion (frame-level + sequence-level)
python -m sample.generate \
--model_path ./save/unimotion_pca_51_humanml_trans_enc_512/model000400000.pt \
--sample_condition t2m \
--input_gt_local_txt ./assets/walk_sit.csv \
--input_text ./assets/wave_hands.txtpython -m sample.generate \
--model_path ./save/unimotion_pca_51_humanml_trans_enc_512/model000400000.pt \
--sample_condition t2m \
--num_samples 10 Squence-Level Text to Motion
python -m sample.generate \
--model_path ./save/unimotion_pca_51_humanml_trans_enc_512/model000400000.pt \
--sample_condition m+t \
--input_text ./assets/demos.txt python -m sample.generate \
--model_path ./save/unimotion_pca_51_humanml_trans_enc_512/model000400000.pt \
--sample_condition m+t \
--num_samples 10 python -m sample.generate \
--model_path ./save/unimotion_pca_51_humanml_trans_enc_512/model000400000.pt \
--sample_condition m+t \
--text_prompt "the person paces back and forth."Motion to Text
demo_youtube.npy is a human pose estimation from youtube video, feel free to use avaliable methods and be creative with video selection
python -m sample.generate \
--model_path ./save/unimotion_pca_51_humanml_trans_enc_512/model000400000.pt \
--sample_condition m2t \
--input_motion_path ./assets/demo_youtube.npypython -m sample.generate \
--model_path ./save/unimotion_pca_51_humanml_trans_enc_512/model000400000.pt \
--sample_condition m2t \
--num_samples 10 Motion Editing
This Example replace the walk forward from frame 83-135 to jog forward, you could also create this motion from any previous text to motion sampling and then conduct the edit.
python -m sample.edit \
--model_path ./save/unimotion_pca_51_humanml_trans_enc_512/model000400000.pt \
--edit_mode in_between \
--input_gt_local_txt ./assets/motion_edited.csv \
--input_motion_path ./assets/example_motion.npy \
--sample_condition t2m \
--guidance_param 0 \
--prefix_end 83 \
--suffix_start 135 \
--input_idx 8 \
--show_inputpython -m train.train_unimotion \
--save_dir save/new_unimotion_pca_51_humanml_trans_enc_512 \
--eval_during_training \
--save_resultsComming soon
When using the code/figures/data/etc., please cite our work
@article{li2024unimotion,
author = {Li, Chuqiao and Chibane, Julian and He, Yannan and Pearl, Naama and Geiger, Andreas and Pons-Moll, Gerard},
title = {Unimotion: Unifying 3D Human Motion Synthesis and Understanding},
journal = {arXiv preprint arXiv:2409.15904},
year = {2024},
}
