Skip to content

Commit c044445

Browse files
committed
BIG BANG
0 parents  commit c044445

File tree

99 files changed

+6869
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

99 files changed

+6869
-0
lines changed

.gitattributes

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
*.ipynb
2+
tools/notebooks/

.gitignore

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
outputs/
2+
datasets
3+
launching
4+
annotation/
5+
.vscode/
6+
bert-base-uncased/
7+
8+
delete*
9+
__pycache__/

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2023 Lucas Ventura
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
<div align="center">
2+
3+
# CoVR: Composed Video Retrieval
4+
## Learning Composed Video Retrieval from Web Video Captions
5+
6+
![CoVR teaser gif](tools/examples/teaser.gif)
7+
8+
</div>
9+
10+
<div align="justify">
11+
12+
> Composed Image Retrieval (CoIR) has recently gained popularity as a task that considers _both_ text and image queries together, to search for relevant images in a database. Most CoIR approaches require manually annotated datasets, comprising image-text-image triplets, where the text describes a modification from the query image to the target image. However, manual curation of CoIR _triplets_ is expensive and prevents scalability. In this work, we instead propose a scalable automatic dataset creation methodology that generates triplets given video-caption _pairs_, while also expanding the scope of the task to include composed _video_ retrieval (CoVR). To this end, we mine paired videos with a similar caption from a large database, and leverage a large language model to generate the corresponding modification text. Applying this methodology to the extensive WebVid2M collection, we automatically construct our WebVid-CoVR dataset, resulting in 1.6 million triplets. Moreover, we introduce a new benchmark for CoVR with a manually annotated evaluation set, along with baseline results. Our experiments further demonstrate that training a CoVR model on our dataset effectively transfers to CoIR, leading to improved state-of-the-art performance in the zero-shot setup on both the CIRR and FashionIQ benchmarks. Our code, datasets, and models are publicly available.
13+
14+
</div>
15+
16+
## Description
17+
This repository contains the code for the paper ["CoVR: Learning Composed Video Retrieval from Web Video Captions"](https://arxiv.org/abs/2308.TODO).
18+
19+
Please visit our [webpage](http://imagine.enpc.fr/~ventural/covr) for more details.
20+
21+
This repository contains:
22+
23+
```markdown
24+
📦 covr
25+
┣ 📂 configs # hydra config files
26+
┣ 📂 src # Pytorch datamodules
27+
┣ 📂 tools # scrips and notebooks
28+
┣ 📜 .gitignore
29+
┣ 📜 LICENSE
30+
┣ 📜 README.md
31+
┣ 📜 test.py
32+
┗ 📜 train.py
33+
34+
```
35+
36+
## Installation :construction_worker:
37+
38+
<details><summary>Create environment</summary>
39+
&emsp;
40+
41+
```bash
42+
conda create --name covr
43+
conda activate covr
44+
```
45+
46+
Install the following packages inside the conda environment:
47+
48+
```bash
49+
python -m pip install pytorch_lightning --upgrade
50+
python -m pip install hydra-core --upgrade
51+
python -m pip install lightning
52+
python -m pip install einops
53+
python -m pip install pandas
54+
python -m pip install opencv-python
55+
python -m pip install timm
56+
python -m pip install fairscale
57+
python -m pip install tabulate
58+
python -m pip install transformers
59+
```
60+
61+
The code was tested on Python 3.8 and PyTorch 2.0.
62+
63+
</details>
64+
65+
<details><summary>Download the datasets</summary>
66+
67+
### WebVid-CoVR
68+
To use the WebVid-CoVR dataset, you will have to download the WebVid videos and the WebVid-CoVR annotations.
69+
70+
To download the annotations, run:
71+
```bash
72+
bash tools/scripts/download_annotations.sh covr
73+
```
74+
75+
To download the videos, install [`mpi4py`](https://mpi4py.readthedocs.io/en/latest/install.html#) and run:
76+
```bash
77+
python tools/scripts/download_covr.py <split>
78+
```
79+
80+
### CIRR
81+
To use the CIRR dataset, you will have to download the CIRR images and the CIRR annotations.
82+
83+
To download the annotations, run:
84+
```bash
85+
bash tools/scripts/download_annotations.sh cirr
86+
```
87+
88+
To download the images, follow the instructions in the [CIRR repository](https://github.com/lil-lab/nlvr/tree/master/nlvr2#direct-image-download). The default folder structure is the following:
89+
90+
```markdown
91+
📦 covr
92+
┣ 📂 datasets
93+
┃ ┣ 📂 CIRR
94+
┃ ┃ ┣ 📂 images
95+
┃ ┃ ┃ ┣ 📂 train
96+
┃ ┃ ┃ ┣ 📂 dev
97+
┃ ┃ ┃ ┗ 📂 test1
98+
```
99+
100+
### FashionIQ
101+
To use the FashionIQ dataset, you will have to download the FashionIQ images and the FashionIQ annotations.
102+
103+
To download the annotations, run:
104+
```bash
105+
bash tools/scripts/download_annotations.sh fiq
106+
```
107+
108+
To download the images, the urls are in the [FashionIQ repository](https://github.com/hongwang600/fashion-iq-metadata/tree/master/image_url). You can use the [this script](https://github.com/yanbeic/VAL/blob/master/download_fashion_iq.py) to download the images. Some missing images can also be found [here](https://github.com/XiaoxiaoGuo/fashion-iq/issues/18). All the images should be placed in the same folder (``datasets/fashion-iq/images``).
109+
110+
</details>
111+
112+
113+
<details><summary>(Optional) Download pre-trained models</summary>
114+
115+
``
116+
117+
To download the checkpoints, run:
118+
```bash
119+
bash tools/scripts/download_pretrained_models.sh
120+
```
121+
122+
</details>
123+
124+
125+
## Usage :computer:
126+
<details><summary>Computing BLIP embeddings</summary>
127+
&emsp;
128+
129+
Before training, you will need to compute the BLIP embeddings for the videos/images. To do so, run:
130+
```bash
131+
python tools/embs/save_blip_embs_vids.py # This will compute the embeddings for the WebVid-CoVR videos.
132+
python tools/embs/save_blip_embs_imgs.py # This will compute the embeddings for the CIRR or FashionIQ images.
133+
```
134+
135+
&emsp;
136+
</details>
137+
138+
139+
<details><summary>Training</summary>
140+
&emsp;
141+
142+
The command to launch a training experiment is the folowing:
143+
```bash
144+
python train.py [OPTIONS]
145+
```
146+
The parsing is done by using the powerful [Hydra](https://github.com/facebookresearch/hydra) library. You can override anything in the configuration by passing arguments like ``foo=value`` or ``foo.bar=value``.
147+
148+
&emsp;
149+
</details>
150+
151+
<details><summary>Evaluating</summary>
152+
&emsp;
153+
154+
The command to evaluate is the folowing:
155+
```bash
156+
python test.py test=<test> [OPTIONS]
157+
```
158+
&emsp;
159+
</details>
160+
161+
<details><summary>Options parameters</summary>
162+
163+
#### Datasets:
164+
- ``data=webvid-covr``: WebVid-CoVR datasets.
165+
- ``data=cirr``: CIRR dataset.
166+
- ``data=fashioniq-split``: FashionIQ dataset, change ``split`` to ``dress``, ``shirt`` or ``toptee``.
167+
168+
#### Tests:
169+
- ``test=all``: Test on WebVid-CoVR, CIRR and all three Fashion-IQ test sets.
170+
- ``test=webvid-covr``: Test on WebVid-CoVR.
171+
- ``test=cirr``: Test on CIRR.
172+
- ``test=fashioniq``: Test on all three Fashion-IQ test sets (``dress``, ``shirt`` and ``toptee``).
173+
174+
#### Checkpoints:
175+
- ``model/ckpt=blip-l-coco``: Default checkpoint for BLIP-L finetuned on COCO.
176+
- ``model/ckpt=webvid-covr``: Default checkpoint for CoVR finetuned on WebVid-CoVR.
177+
178+
#### Training
179+
- ``trainer=gpu``: training with CUDA, change ``devices`` to the number of GPUs you want to use.
180+
- ``trainer=ddp``: training with Distributed Data Parallel (DDP), change ``devices`` and ``num_nodes`` to the number of GPUs and number of nodes you want to use.
181+
- ``trainer=cpu``: training on the CPU (not recommended).
182+
183+
#### Logging
184+
- ``trainer/logger=csv``: log the results in a csv file. Very basic functionality.
185+
- ``trainer/logger=wandb``: log the results in [wandb](https://wandb.ai/). This requires to install ``wandb`` and to set up your wandb account. This is what we used to log our experiments.
186+
- ``trainer/logger=<other>``: Other loggers (not tested).
187+
188+
#### Machine
189+
- ``machine=server``: You can change the default path to the dataset folder and the batch size. You can create your own machine configuration by adding a new file in ``configs/machine``.
190+
191+
#### Experiment
192+
There are many pre-defined experiments from the paper in ``configs/experiments``. Simply add ``experiment=<experiment>`` to the command line to use them.
193+
194+
&emsp;
195+
196+
</details>
197+
198+
## Citation
199+
If you use this dataset and/or this code in your work, please cite our [paper](htto://TODO):
200+
201+
```markdown
202+
@inproceedings{ventura23covr,
203+
title = {{CoVR}: Learning Composed Video Retrieval from Web Video Captions},
204+
author = {Lucas Ventura and Antoine Yang and Cordelia Schmid and G{\"u}l Varol},
205+
booktitle = {arXiv},
206+
year = {2023}
207+
}
208+
```
209+
210+
## Acknowledgements
211+
Based on [BLIP](https://github.com/salesforce/BLIP/) and [lightning-hydra-template](https://github.com/ashleve/lightning-hydra-template/tree/main).
212+

configs/data/cirr.yaml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
dataname: cirr
2+
_target_: src.data.cirr.CIRRDataModule
3+
4+
# Paths
5+
dataset_dir: ${paths.datasets_dir}/CIRR
6+
7+
batch_size: ${machine.batch_size}
8+
num_workers: ${machine.num_workers}
9+
10+
annotation:
11+
train: ${paths.work_dir}/annotation/cirr/cap.rc2.train.json
12+
val: ${paths.work_dir}/annotation/cirr/cap.rc2.val.json
13+
14+
img_dirs:
15+
train: ${data.dataset_dir}/images/train
16+
val: ${data.dataset_dir}/images/dev
17+
18+
emb_dirs:
19+
train: ${data.dataset_dir}/blip-embs-large/train
20+
val: ${data.dataset_dir}/blip-embs-large/dev
21+
22+
image_size: 384

configs/data/fashioniq-base.yaml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
dataname: fashioniq-${data.category}
2+
_target_: src.data.fashioniq.FashionIQDataModule
3+
4+
# Paths
5+
dataset_dir: ${paths.datasets_dir}/fashion-iq
6+
7+
batch_size: ${machine.batch_size}
8+
num_workers: ${machine.num_workers}
9+
10+
annotation:
11+
train: ${paths.work_dir}/annotation/fashion-iq/cap.${data.category}.train.json
12+
val: ${paths.work_dir}/annotation/fashion-iq/cap.${data.category}.val.json
13+
14+
targets:
15+
train: ${paths.work_dir}/annotation/fashion-iq/split.${data.category}.train.json
16+
val: ${paths.work_dir}/annotation/fashion-iq/split.${data.category}.val.json
17+
18+
img_dirs:
19+
train: ${data.dataset_dir}/images/
20+
val: ${data.dataset_dir}/images/
21+
22+
emb_dirs:
23+
train: ${data.dataset_dir}/blip-embs-large/
24+
val: ${data.dataset_dir}/blip-embs-large/
25+
26+
image_size: 384
27+
28+
category: ???

configs/data/fashioniq-dress.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
defaults:
2+
- fashioniq-base.yaml
3+
4+
category: dress

configs/data/fashioniq-shirt.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
defaults:
2+
- fashioniq-base.yaml
3+
4+
category: shirt

configs/data/fashioniq-toptee.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
defaults:
2+
- fashioniq-base.yaml
3+
4+
category: toptee

configs/data/webvid-covr.yaml

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
dataname: webvid-covr
2+
_target_: src.data.webvid_covr.WebVidCoVRDataModule
3+
4+
image_size: 384
5+
iterate: "pth2"
6+
vid_query_method: middle
7+
vid_frames: 1
8+
emb_pool: query
9+
10+
# Paths
11+
dataset_dir: ${paths.datasets_dir}/WebVid
12+
13+
batch_size: ${machine.batch_size}
14+
num_workers: ${machine.num_workers}
15+
16+
annotation:
17+
train: ${paths.work_dir}/annotation/webvid-covr/webvid2m-covr_train.csv
18+
val: ${paths.work_dir}/annotation/webvid-covr/webvid8m-covr_val.csv
19+
20+
vid_dirs:
21+
train: ${data.dataset_dir}/2M/train
22+
val: ${data.dataset_dir}/8M/train
23+
24+
emb_dirs:
25+
train: ${data.dataset_dir}/2M/blip-vid-embs-${model.model.vit}-all
26+
val: ${data.dataset_dir}/8M/blip-vid-embs-${model.model.vit}-all

0 commit comments

Comments
 (0)