Skip to content

XuHuangbiao/DanceSport

Repository files navigation

DanceSport: A Multi-Modal Video Dataset and Hierarchical Language-Guided Audio-Visual Learning for Long-Term Sports Assessment

This is the repository for our DanceSport dataset and code for our "Hierarchical Multidimensional Language-guided Audio-Visual Learning (H-MLAVL)".

Environments

  • RTX 3090
  • CUDA: 12.4
  • Python: 3.8.19
  • PyTorch: 2.4.1+cu124

Features

The features and label files of our DanceSport dataset can be download from here.

The features and label files of Rhythmic Gymnastics and Fis-V dataset can be download from the GDLT repository.

The features and label files of FS1000 dataset can be download from the Skating-Mixer repository.

The features and label files of LOGO dataset can be download from the UIL-AQA repository.

Pretrained Model

If you wish to extract your own action text labels, please download the ViFi-CLIP pretrained model and place it in:

weights/k400_clip_complete_finetuned_30_epochs.pth

Running

The following are examples only, more details coming soon!

Please fill in or select the args enclosed by {} first.

  • Training
CUDA_VISIBLE_DEVICES={device ID} python main.py --video-path {path of video features} --audio-path {path of audio features} --train-label-path {path of label file of training set} --test-label-path {path of label file of test set} --model-name {the name used to save model and log} --action-type {Ball/Clubs/Hoop/Ribbon} --lr 1e-2 --epoch {250/400/500/150} --n_decoder 2 --n_query 4 --alpha 1.0 --margin 1.0 --lr-decay cos --decay-rate 0.01 --dropout 0.3
  • Testing
CUDA_VISIBLE_DEVICES={device ID} python main.py --video-path {path of video features} --audio-path {path of audio features} --train-label-path {path of label file of training set} --test-label-path {path of label file of test set} --action-type {Ball/Clubs/Hoop/Ribbon} --n_decoder 2 --n_query 4 --dropout 0.3 --test --ckpt {the name of the used checkpoint}

Acknowledgements

This repository builds upon MLAVL (CVPR 2025).

We thank the authors for their contributions to the research community.

About

DanceSport: A Multi-Modal Video Dataset and Hierarchical Language-Guided Audio-Visual Learning for Long-Term Sports Assessment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages