A set of tools that help train and run motion transfer models. Inspired by Everbody Dance Now (2019), these tools facilitate the labelling of input video, tuning of training parameters, training, and synthesis of video.
This is an example of running a transfer from one subject to the other, trained for 52 epochs on an NVIDIA RTX 2080 Ti without normalization or refinement of the faces.
run.py provides a high level interface to the multiple steps required for
training and using a model. run.py takes a YAML configuration file,
constructs a number of commands to task-specific scripts, and executes them.
The task scripts are idempotent, allowing the user to easily manage multiple
projects without tracking their current state.
The tasks for a basic motion transfer project are:
- label data
- normalize labels (optional)
- train low-resolution model
- train high-resolution model
- generate outputs
Clone the repo:
git clone --recurse-submodules https://github.com/heisters/motion-transfer.git
cd motion-transferSetup the environment with a fresh virtualenv:
mkvirtualenv motionxfer
workon motionxfer
python -m pip install -r requirements.txt
python -m pip install -r optional-requirements.txt # installs apex for fp16 trainingPut something like the following in a file called config.yml:
SubjectA_to_SubjectB:
width: 1024
height: 576
labels: 34
data:
- source: ~/media/SubjectA.mov
- source: ~/media/SubjectB.mov
options:
- directory-prefix test
normalize: false
train:
global:
epochs: 12
options:
- no_flip
local:
epochs: 40
options:
- no_flip
generate: trueThis will label the Subject A video with OpenPose (poses) and dlib (faces), train a model on Subject A and its labels, then generate a video using the labels from the Subject B video without normalization. To execute it, simply run:
./run.py SubjectA_to_SubjectBconfig.yml supports multiple projects, each under a key that provides its
name. This name will be used to name data, models, and generated video. Every
project requires three configuration parameters:
-
widthandheightmust be divisible by 32. This will set the default height of the data, the high-resolution model, and the generated video. The low-resolution model defaults to running at half this resolution. These values have a significant effect on memory usage, so they are limited by the memory available on your GPU(s). For help finding a resolution that is divisible by 32 but preserves a given aspect ratio, you can runresize_divisible_by.py -
labelsis the number of labels in your training data, and depends on whether you are labelling labelling faces, and/or including multiple distinct people/label spaces in your data. If you are building an unlabelled model (straight RGB values), this should be set to 0. Otherwise, the values are:- OpenPose: 26 (25 labels + 1 for "nothing")
- Faces: add 8
./run.py and the underlying scripts do their best to avoid unnecessary work,
and pick up where they left off quickly if you had to abort a run for some
reason. If you have changed your config, and you want to rebuild some part of
the project, you will need to remove underlying data manually. For instance,
to rebuild a project from scratch:
rm -r data/SubjectA checkpoints/SubjectA_{local,global} results/SubjectA
./run.py SubjectAIf you wanted to only rebuild test data and rerun video generation:
rm -r data/SubjectA/test_* results/SubjectA
./run.py SubjectA --only data,generateEach part of the configuration accepts a number of options that allow you to fine tune your work. Following are some examples:
An unlabelled model that uses RGB values directly:
SubjectA:
width: 1024
height: 576
labels: 0
data:
- source: ~/media/SubjectA_training.mov
options:
- train-a
- source: ~/media/SubjectB.mov
options:
- train-b
# this video will be used as the input to the model during video generation
- source: ~/media/SubjectA_generate.mov
options:
- test-aSubsampling data to use only every other frame, do not label faces:
SubjectA:
width: 1024
height: 576
labels: 26
data:
- source: ~/media/SubjectA.mov
options:
- subsample 2
- no-label-faceUse the first 5 minutes of the video for training, the rest for generation:
SubjectA:
width: 1024
height: 576
labels: 26
data:
- source: ~/media/SubjectA.mov
options:
- trim 0:300
- no-label-face
- source: ~/media/SubjectA.mov
options:
- trim 300:-1
- no-label-face
- directory-prefix testTrain a model on two faces, labelling each in a different label-space, and cropping centered on the face:
Hybrid:
width: 1024
height: 1024
labels: 68
data:
- source: ~/media/SubjectA.mov
# resizes before cropping
resize: 1344x1344
# crops to the model size of 1024x1024
crop: true
options:
- label-with openpose
- label-face
# the labels for this subject will be 0 - 33
- label-offset 0
# tell cropping to center on the face
- crop-center face
- trim 0:360
- source: ~/media/SubjectB.mov
resize: 1344x1344
crop: true
options:
# this should be set to the length of the previous video in
# seconds, multiplied by its FPS (eg. 360 * 24)
- frame-offset 8640
- label-with openpose
- label-face
# the labels for this subject will be 34 - 67
- label-offset 34
- crop-center face
- trim 0:360You can also train a model, and then use it to generate multiple videos:
# Train model
SubjectA:
width: 1024
height: 576
labels: 26
data:
- source: ~/media/SubjectA.mov
options:
- no-label-face
normalize: false
train:
global:
epochs: 12
options:
- no_flip
local:
epochs: 40
options:
- no_flip
# Generate by inputting Subject B labels into the Subject A model
Transfer_SubjectB:
width: 1024
height: 576
labels: 26
data:
- source: ~/media/SubjectA.mov
options:
- no-label-face
- directory-prefix test
normalize: false
generate:
model: SubjectA
# Generate by inputting Subject C labels into the Subject A model
Transfer_SubjectC:
width: 1024
height: 576
labels: 26
data:
- source: ~/media/SubjectC.mov
options:
- no-label-face
- directory-prefix test
normalize: false
generate:
model: SubjectATo see all the available options, you can run each of the task scripts directly:
./build_dataset.py --help
./normalize.py --help
./train.py --help
./generate_video.py --help