Single-modality

Installation

Please follow the installation instructions in INSTALL.

You can find the dataset instructions in DATASET.

You can find all the models and the scripts in MODEL_ZOO.

We use CLIP pretrained models as the unmasked teachers by default:

For training, you can simply run the pretraining scripts in exp/pretraining as follows:

bash ./exp/pretraining/b16_ptk710_e200_f8_res224.sh

⚠️ Notes:

Chage DATA_PATH to your data path before running the scripts.
--sampling_rate is set to 1 for sprase sampling.
The latest checkpoint will be automatically saved while training, thus we use a large --save_ckpt_freq.
For UMT-B/16, we use CLIP-B/16 as the teacher. While for UMT-L/16, we use CLIP-L/14 as the teacher and the input resolution is set to 196.

For finetuning, you can simply run the pretraining scripts in exp/finetuning as follows:

bash ./exp/finetuning/k400/b16_ptk710_ftk710_ftk400_f8_res224.sh

⚠️ Notes:

Chage DATA_PATH And PREFIX to your data path before running the scripts.
Chage MODEL_PATH to your model path.
Set --use_checkpoint and --checkpoint_num to save GPU memory.
The best checkpoint will be automatically evaluated with --test_best.
Set --test_num_segment and --test_num_crop for different evaluation strategies.
To only run evaluation, just set --eval.