Byeonghyun Pak*, Byeongju Woo*, Sunghwan Kim*, Dae-hwan Kim, Hoseong Kim†
Agency for Defense Development
ECCV 2024
[Project Page] [Paper]
-
The requirements can be installed with:
conda create -n tqdm python=3.9 numpy=1.26.4 conda activate tqdm conda install pytorch==2.0.1 torchvision==0.15.2 pytorch-cuda=11.8 -c pytorch -c nvidia pip install -r requirements.txt pip install xformers==0.0.20 pip install mmcv-full==1.5.3
-
Please download the pre-trained CLIP and EVA02-CLIP and save them in
./pretrainedfolder.Model Type Link CLIP ViT-B-16.ptofficial repo EVA02-CLIP EVA02_CLIP_L_336_psz14_s6Bofficial repo
-
You can download tqdm model checkpoints:
Model Pretrained Trained on Config Link tqdm-clip-vit-b-gtaCLIP GTA5 config download link tqdm-eva02-clip-vit-l-gtaEVA02-CLIP GTA5 config download link tqdm-eva02-clip-vit-l-cityEVA02-CLIP Cityscapes config download link
-
To set up datasets, please follow the official TLDR repo.
-
After downloading the datasets, edit the data folder root in the dataset config files following your environment.
src_dataset_dict = dict(..., data_root='[YOUR_DATA_FOLDER_ROOT]', ...) tgt_dataset_dict = dict(..., data_root='[YOUR_DATA_FOLDER_ROOT]', ...)
bash dist_train.sh configs/[TRAIN_CONFIG] [NUM_GPUs]
[TRAIN_CONFIG]: Train configuration file (e.g.,tqdm/tqdm_eve_vit-l_1e-5_20k-g2c-512.py)[NUM_GPUs]: Number of GPUs used for training
To enable multi-scale flip augmentation during testing, use the --aug-test option.
Note: The experiment results in our main paper were obtained without multi-scale flip augmentation.
bash dist_test.sh configs/[TEST_CONFIG] work_dirs/[MODEL] [NUM_GPUs] --eval mIoU
[TRAIN_CONFIG]: Test configuration file (e.g.,tqdm/tqdm_eve_vit-l_1e-5_20k-g2b-512.py)[MODEL]: Model checkpoint (e.g.,tqdm_eve_vit-l_1e-5_20k-g2c-512/epoch_last.pth)[NUM_GPUs]: Number of GPUs used for testing
- configs/tqdm/* - Config files for the final tqdm
- models/segmentors/* - Overall tqdm framework
- mmseg/models/utils/assigner.py - Implementation of fixed matching
- mmseg/models/decode_heads/tqdm_head.py - Our textual object query-based segmentation head
- mmseg/models/plugins/tqdm_msdeformattn_pixel_decoder.py - Our pixel decoder with text-to-pixel attention
If you find our code helpful, please cite our paper:
@inproceedings{pak2024textual,
title={Textual Query-Driven Mask Transformer for Domain Generalized Segmentation},
author={Pak, Byeonghyun and Woo, Byeongju and Kim, Sunghwan and Kim, Dae-hwan and Kim, Hoseong},
booktitle={European Conference on Computer Vision},
pages={37--54},
year={2024},
organization={Springer}
}This project is based on the following open-source projects. We thank the authors for sharing their codes.