CoVR-VidLLM-CVPR25

Description

This repository contains the code for CoVR-VidLLM workshop CVPR-2025.

Please visit our Workshop Page for more details.

The repository structure:

📦 covr
 ┣ 📂 configs                 # hydra config files
 ┣ 📂 src                     # Pytorch datamodules
 ┣ 📂 tools                   # scripts and notebooks
 ┣ 📜 .gitignore
 ┣ 📜 README.md
 ┣ 📜 test.py                 # test script
 ┣ 📜 validation_set.csv      # textual part of the validation set for the challenge (should be used during the Validation phase)
 ┣ 📜 test_set.csv            # textual part of the test set for the challenge (will be published during the Test phase)

Installation

Create environment

conda create --name covr-env
conda activate covr-env

To install the necessary packages, use requirements.txt file:

python -m pip install -r requirements.txt

The code was tested on Python 3.10 and PyTorch 2.4.

(Optional) Download pre-trained models

To download the checkpoints, run:

bash tools/scripts/download_pretrained_models.sh

Download the dataset

You can download the exact video-text triplets used in our validation and test sets from the HuggingFace page.

Optionally, in case you already have the original WebVid-CoVR test dataset split you can simply provide its path because the video-text triplets were sampled from there.

Usage

Computing BLIP embeddings

Before evaluating, you will need to compute the BLIP embeddings for the videos. To do so, run:

# This will compute the BLIP embeddings for the WebVid-CoVR videos. 
# Note that you can use multiple GPUs with --num_shards and --shard_id

# For the validation set:
python tools/embs/save_blip_embs_vids.py --video_dir datasets/WebVid/8M/train --todo_ids validation_set.csv
# Then change the name of the generated folder to "blip-vid-embs-large-all_ours_val"

# For the test set (once it is public):
python tools/embs/save_blip_embs_vids.py --video_dir datasets/WebVid/8M/train --todo_ids test_set.csv
# Then change the name of the generated folder to "blip-vid-embs-large-all_ours_test"

If you are interested in using the whole WebVid-CoVR dataset (not required for the challenge):

# This will compute the BLIP embeddings for the WebVid-CoVR-Train videos.
python tools/embs/save_blip_embs_vids.py --video_dir datasets/WebVid/2M/train --todo_ids annotation/webvid-covr/webvid2m-covr_train.csv 

# This will compute the BLIP embeddings for the WebVid-CoVR-Test videos.
python tools/embs/save_blip_embs_vids.py --video_dir datasets/WebVid/8M/train --todo_ids annotation/webvid-covr/webvid8m-covr_test.csv 

# This will compute the BLIP embeddings for the WebVid-CoVR modifications text. Only needed if using the caption retrieval loss (model/loss_terms=si_ti+si_tc).
python tools/embs/save_blip_embs_txts.py annotation/webvid-covr/webvid2m-covr_train.csv datasets/WebVid/2M/blip-vid-embs-large-all

Evaluating

Calculating Query Features

The command to calculate the query feature results for Image/Video + description:

# On the challenge splits:
python test.py test=webvid-covr_our_val
python test.py test=webvid-covr_our_test

Extra options (not required for the challenge):

# On the original WebVid-CoVR test set:
python test.py test=webvid-covr

# On description only:
python test.py test=webvid-covr_text

The results will be saved in a torch tensor file query_feat.pt and query_feat_txt_only.pt in the output folder for Image/Video + Description and Descriptions only respectively.

Next, make sure to fuse/average the embeddings for each video in the resulting .pth file (which is originally a (15*1000)x256 tensor, where 15 - number of key frames, 1000 - number of samples, and 256 - feature dimension).

Finally, the fused embeddings should be saved as a numpy .npy file (which should contain a 1000x256 numpy nd-array, where 1000 - number of samples and 256 - feature dimension).

Simply submit this .npy file to the evaluation server on the evalAI challenge page.

Calculating Recalls for evaluation (not required for the challenge)

This option can be used once the labels are published, but before that use our evaluation server on the evalAI challenge page.

To calculate the recalls for the query features results for Image/Video + description, execute the following command:

python evaluate_scores.py evaluate=webvid-covr

And, to calculate the recalls for the query features results for description only, execute the following command:

python evaluate_scores.py evaluate=webvid-covr_text

The recalls will be saved in a json file recalls.json and recalls_txt_only.pt in the output folder for Image/Video + Description and Descriptions only respectively.

The Format of the recalls.json is as following:

{
  "R1": 5.26,
  "R5": 15.79,
  "R10": 47.37,
  "R50": 100.0,
  "meanR3": 22.81,
  "meanR4": 42.11,
  "annotation": "webvid8m-covr_test_new.csv"
}

Acknowledgements

Based on CoVR, BLIP and lightning-hydra-template.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoVR-VidLLM-CVPR25

Description

Installation

Create environment

(Optional) Download pre-trained models

Download the dataset

Usage

Computing BLIP embeddings

Evaluating

Calculating Query Features

Calculating Recalls for evaluation (not required for the challenge)

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
configs		configs
src		src
tools		tools
.gitignore		.gitignore
README.md		README.md
evalute_scores.py		evalute_scores.py
requirements.txt		requirements.txt
test.py		test.py
test_set.csv		test_set.csv
validation_set.csv		validation_set.csv

Folders and files

Latest commit

History

Repository files navigation

CoVR-VidLLM-CVPR25

Description

Installation

Create environment

(Optional) Download pre-trained models

Download the dataset

Usage

Computing BLIP embeddings

Evaluating

Calculating Query Features

Calculating Recalls for evaluation (not required for the challenge)

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages