Please follow the installation instructions in INSTALL.
You can find the dataset instructions in DATASET.
You can find all the models and the scripts in MODEL_ZOO.
We use CLIP pretrained models as the unmasked teachers by default:
- Follow extract.ipynb to extract visual encoder from CLIP.
- Change
MODEL_PATHin clip.py.
For training, you can simply run the pretraining scripts in exp/pretraining as follows:
bash ./exp/pretraining/b16_ptk710_e200_f8_res224.sh- Chage
DATA_PATHto your data path before running the scripts. --sampling_rateis set to 1 for sprase sampling.- The latest checkpoint will be automatically saved while training, thus we use a large
--save_ckpt_freq. - For UMT-B/16, we use CLIP-B/16 as the teacher. While for UMT-L/16, we use CLIP-L/14 as the teacher and the input resolution is set to 196.
For finetuning, you can simply run the pretraining scripts in exp/finetuning as follows:
bash ./exp/finetuning/k400/b16_ptk710_ftk710_ftk400_f8_res224.sh- Chage
DATA_PATHAndPREFIXto your data path before running the scripts. - Chage
MODEL_PATHto your model path. - Set
--use_checkpointand--checkpoint_numto save GPU memory. - The best checkpoint will be automatically evaluated with
--test_best. - Set
--test_num_segmentand--test_num_cropfor different evaluation strategies. - To only run evaluation, just set
--eval.