Skip to content

Coral79/Unimotion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unimotion: Unifying 3D Human Motion Synthesis and Understanding

Unimotion: Unifying 3D Human Motion Synthesis and Understanding
Chuqiao Li, Julian Chibane, Yannan He, Naama Pearl, Andreas Geiger, Gerard Pons-Moll
[Project Page] [Paper]

3DV(Oral), 2025

News 🚩

  • [2024/09/30] Unimotion paper is available on ArXiv.
  • [2025/13/04] Code and pre-trained released.

Key Insight

  • Alignment between frame-level text and motion enables the temproal semantic awareness of the motion generation!
  • Separate diffusion process for aligned motion and text enables multi-directional inference!
  • Our model allows Multiple Novel Applications:
    • Hierarchical Control: Allowing users to specify motion at different levels of detail
    • Motion Text Generation: Obtaining motion text descriptions for existing MoCap data or YouTube videos
    • Motion Editing: Allowing for editability, generating motion from text, and editing the motion via text edits

Install Environment

Install ffmpeg (if not already installed):

sudo apt update
sudo apt install ffmpeg

For windows use this instead.

Setup conda env:

conda env create -f environment.yml
conda activate unimotion
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git

Download dependencies:

bash prepare/download_smpl_files.sh
bash prepare/download_glove.sh
bash prepare/download_t2m_evaluators.sh

Data Preparation

Download the data:

HumanML3D (Sequence-level motion and text) - Follow the instructions in HumanML3D, then run the following command:

cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D

BABEL Frame-level text Embeddings

You can download the preprocessed CLIP text embeddings (derived from BABEL annotations) with:

bash prepare/download_clip_embeddings.sh

These processed embeddings are all you need for training, sampling, and evaluation.

If you'd like to inspect the ground-truth frame-level motion-text alignments yourself, please refer to the instructions in this repo to download text labels and unify annotations accross different datasets.

Directory Structure

After running the download scripts, your directory structure should look like this:

Unimotion/
├── dataset/
    └── HumanML3D/
        ├── clip_encoder.py
        ├── clip_enc_single/
        ├── examples_editing.txt
        ├── Mean_seg_pca_51.npy
        ├── pca/
        ├── README.md
        ├── Std_seg_pca_51.npy
        ├── test_ft.txt
        ├── test_ft_no_overlap.txt
        ├── texts/
        ├── train_ft.txt
        ├── val_ft.txt
        └── val_ft_no_overlap.txt

Download Pretrained Models

Download the model then unzip and place them in ./save/.

bash prepare/download_checkpoints.sh

Sampling

Frame-Level Text to Motion

Generate from your frame-level text file

python -m sample.generate \
--model_path ./save/unimotion_pca_51_humanml_trans_enc_512/model000400000.pt \
--sample_condition t2m \
--input_gt_local_txt ./assets/walk_sit.csv \
--guidance_param 0

Generate from test set frame-level prompts

python -m sample.generate \
--model_path ./save/unimotion_pca_51_humanml_trans_enc_512/model000400000.pt \
--sample_condition t2m \
--num_samples 10 \
--guidance_param 0
Hierarchical Text to Motion (frame-level + sequence-level)

Generate from your text file (frame-level + squence-level)

python -m sample.generate \
--model_path ./save/unimotion_pca_51_humanml_trans_enc_512/model000400000.pt \
--sample_condition t2m \
--input_gt_local_txt ./assets/walk_sit.csv \
--input_text ./assets/wave_hands.txt

Generate from test set prompts (frame-level + squence-level)

python -m sample.generate \
--model_path ./save/unimotion_pca_51_humanml_trans_enc_512/model000400000.pt \
--sample_condition t2m \
--num_samples 10 
Squence-Level Text to Motion

Generate from your sequence-level text file

python -m sample.generate \
--model_path ./save/unimotion_pca_51_humanml_trans_enc_512/model000400000.pt \
--sample_condition m+t \
--input_text ./assets/demos.txt 

Generate from test set sequence-level prompts

python -m sample.generate \
--model_path ./save/unimotion_pca_51_humanml_trans_enc_512/model000400000.pt \
--sample_condition m+t \
--num_samples 10 

Generate a single sequence-level prompt

python -m sample.generate \
--model_path ./save/unimotion_pca_51_humanml_trans_enc_512/model000400000.pt \
--sample_condition m+t \
--text_prompt "the person paces back and forth."
Motion to Text

Generate from your motion file

demo_youtube.npy is a human pose estimation from youtube video, feel free to use avaliable methods and be creative with video selection

python -m sample.generate \
--model_path ./save/unimotion_pca_51_humanml_trans_enc_512/model000400000.pt \
--sample_condition m2t \
--input_motion_path ./assets/demo_youtube.npy

Generate from test set motions

python -m sample.generate \
--model_path ./save/unimotion_pca_51_humanml_trans_enc_512/model000400000.pt \
--sample_condition m2t \
--num_samples 10 
Motion Editing

Edit from your motion file

This Example replace the walk forward from frame 83-135 to jog forward, you could also create this motion from any previous text to motion sampling and then conduct the edit.

python -m sample.edit \
--model_path ./save/unimotion_pca_51_humanml_trans_enc_512/model000400000.pt \
--edit_mode in_between \
--input_gt_local_txt ./assets/motion_edited.csv \
--input_motion_path ./assets/example_motion.npy \
--sample_condition t2m \
--guidance_param 0 \
--prefix_end 83 \
--suffix_start 135 \
--input_idx 8 \
--show_input

Training

python -m train.train_unimotion \
--save_dir save/new_unimotion_pca_51_humanml_trans_enc_512 \
--eval_during_training \
--save_results

Evaluation

Comming soon

Citation

When using the code/figures/data/etc., please cite our work

@article{li2024unimotion,
  author    = {Li, Chuqiao and Chibane, Julian and He, Yannan and Pearl, Naama and Geiger, Andreas and Pons-Moll, Gerard},
  title     = {Unimotion: Unifying 3D Human Motion Synthesis and Understanding},
  journal   = {arXiv preprint arXiv:2409.15904},
  year      = {2024},
}

About

Pytorch implementation of Unimotion: Unifying 3D Human Motion Synthesis and Understanding.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors