DanceSport: A Multi-Modal Video Dataset and Hierarchical Language-Guided Audio-Visual Learning for Long-Term Sports Assessment
This is the repository for our DanceSport dataset and code for our "Hierarchical Multidimensional Language-guided Audio-Visual Learning (H-MLAVL)".
- RTX 3090
- CUDA: 12.4
- Python: 3.8.19
- PyTorch: 2.4.1+cu124
The features and label files of our DanceSport dataset can be download from here.
The features and label files of Rhythmic Gymnastics and Fis-V dataset can be download from the GDLT repository.
The features and label files of FS1000 dataset can be download from the Skating-Mixer repository.
The features and label files of LOGO dataset can be download from the UIL-AQA repository.
If you wish to extract your own action text labels, please download the ViFi-CLIP pretrained model and place it in:
weights/k400_clip_complete_finetuned_30_epochs.pth
Please fill in or select the args enclosed by {} first.
- Training
CUDA_VISIBLE_DEVICES={device ID} python main.py --video-path {path of video features} --audio-path {path of audio features} --train-label-path {path of label file of training set} --test-label-path {path of label file of test set} --model-name {the name used to save model and log} --action-type {Ball/Clubs/Hoop/Ribbon} --lr 1e-2 --epoch {250/400/500/150} --n_decoder 2 --n_query 4 --alpha 1.0 --margin 1.0 --lr-decay cos --decay-rate 0.01 --dropout 0.3
- Testing
CUDA_VISIBLE_DEVICES={device ID} python main.py --video-path {path of video features} --audio-path {path of audio features} --train-label-path {path of label file of training set} --test-label-path {path of label file of test set} --action-type {Ball/Clubs/Hoop/Ribbon} --n_decoder 2 --n_query 4 --dropout 0.3 --test --ckpt {the name of the used checkpoint}
This repository builds upon MLAVL (CVPR 2025).
We thank the authors for their contributions to the research community.