Skip to content

Code for "AGAV-Rater: Enhancing LMM for AI-Generated Audio-Visual Quality Assessment"

Notifications You must be signed in to change notification settings

charlotte9524/AGAV-Rater

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AGAV-Rater: Adapting Large Multimodal Model for AI-Generated Audio-Visual Quality Assessment

🎉 Accepted by ICML 2025

Paper Project Website Hugging Face Model Hugging Face Dataset

Teaser Image

Official implementation of AGAV-Rater: Adapting Large Multimodal Model for AI-Generated Audio-Visual Quality Assessment, ICML 2025

Yuqin Cao1, Xiongkuo Min1#, Yixuan Gao1, Wei Sun2, Guangtao Zhai1#

1Shanghai Jiao Tong University, 2East China Normal University, #Corresponding author(s).

Installation

git clone https://github.com/charlotte9524/AGAV-Rater.git
cd AGAV-Rater

# create conda environment
conda create -n agav-rater python=3.9
conda activate agav-rater

# install requirements
pip install -r requirements.txt
apt-get update && apt-get install ffmpeg libsm6 libxext6  -y
conda install mpi4py
pip install pytorchvideo

TODO

  • Release the training script
  • ✅ Release the checkpoint on AGAVQA-MOS subset
  • ✅ Release the AGAVQA-3k dataset
  • ✅ Release the inference script

Get Datasets

You can download AGAVQA-MOS and AGAVQA-Pair as follows:

import os, glob
from huggingface_hub import snapshot_download


snapshot_download("caoyuqin/AGAVQA-3k", repo_type="dataset", local_dir="./dataset", local_dir_use_symlinks=False)

gz_files = glob.glob("dataset/*.zip")

for gz_file in gz_files:
    print(gz_file)
    os.system("unzip {} -d ./dataset/".format(gz_file))

Get Checkpoints

You can download our checkpoints on the AGAVQA-MOS subset as follows:

from huggingface_hub import snapshot_download

snapshot_download("caoyuqin/AGAV-Rater", repo_type="model", local_dir="./checkpoints", local_dir_use_symlinks=False)

Inference

    python inference.py --model-path=./checkpoints --video-path=<path>

Example

    python inference.py --model-path=./checkpoints --video-path=./assets/33_sora_elevenlabs_2_rank1.mp4
    #Audio quality: 50.5938, audio-visual consistency: 63.5000, overall audio-visual quality: 52.8438
    python inference.py --model-path=./checkpoints --video-path=./assets/33_sora_elevenlabs_0_rank2.mp4
    #Audio quality: 41.4375, audio-visual consistency: 50.5938, overall audio-visual quality: 41.4375
    python inference.py --model-path=./checkpoints --video-path=./assets/33_sora_elevenlabs_1_rank3.mp4
    #Audio quality: 25.7812, audio-visual consistency: 56.4375, overall audio-visual quality: 37.8438
    python inference.py --model-path=./checkpoints --video-path=./assets/33_sora_elevenlabs_4_rank4.mp4
    #Audio quality: 21.0000, audio-visual consistency: 52.3438, overall audio-visual quality: 33.8125

Contact

Please contact the first author of this paper for queries.

Citation

@article{cao2025agav,
  title={AGAV-Rater: Adapting Large Multimodal Model for AI-Generated Audio-Visual Quality Assessment},
  author={Cao, Yuqin and Min, Xiongkuo and Gao, Yixuan and Sun, Wei and Zhai, Guangtao},
  journal={arXiv preprint arXiv:2501.18314},
  year={2025}
}

About

Code for "AGAV-Rater: Enhancing LMM for AI-Generated Audio-Visual Quality Assessment"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages