Skip to content

ByeongHyunPak/tqdm

Repository files navigation

[ECCV 2024] Textual Query-Driven Mask Transformer for Domain Generalized Segmentation

PWC
PWC

Byeonghyun Pak*, Byeongju Woo*, Sunghwan Kim*, Dae-hwan Kim, Hoseong Kim
Agency for Defense Development
ECCV 2024

Environment

Requirements

  • The requirements can be installed with:

    conda create -n tqdm python=3.9 numpy=1.26.4
    conda activate tqdm
    conda install pytorch==2.0.1 torchvision==0.15.2 pytorch-cuda=11.8 -c pytorch -c nvidia
    pip install -r requirements.txt
    pip install xformers==0.0.20
    pip install mmcv-full==1.5.3 

Pre-trained VLM Models

  • Please download the pre-trained CLIP and EVA02-CLIP and save them in ./pretrained folder.

    Model Type Link
    CLIP ViT-B-16.pt official repo
    EVA02-CLIP EVA02_CLIP_L_336_psz14_s6B official repo

Checkpoints

Datasets

  • To set up datasets, please follow the official TLDR repo.

  • After downloading the datasets, edit the data folder root in the dataset config files following your environment.

    src_dataset_dict = dict(..., data_root='[YOUR_DATA_FOLDER_ROOT]', ...)
    tgt_dataset_dict = dict(..., data_root='[YOUR_DATA_FOLDER_ROOT]', ...)

Train

bash dist_train.sh configs/[TRAIN_CONFIG] [NUM_GPUs]
  • [TRAIN_CONFIG]: Train configuration file (e.g., tqdm/tqdm_eve_vit-l_1e-5_20k-g2c-512.py)
  • [NUM_GPUs]: Number of GPUs used for training

Test

To enable multi-scale flip augmentation during testing, use the --aug-test option.

Note: The experiment results in our main paper were obtained without multi-scale flip augmentation.

bash dist_test.sh configs/[TEST_CONFIG] work_dirs/[MODEL] [NUM_GPUs] --eval mIoU
  • [TRAIN_CONFIG]: Test configuration file (e.g., tqdm/tqdm_eve_vit-l_1e-5_20k-g2b-512.py)
  • [MODEL]: Model checkpoint (e.g., tqdm_eve_vit-l_1e-5_20k-g2c-512/epoch_last.pth)
  • [NUM_GPUs]: Number of GPUs used for testing

The Most Relevant Files

Citation

If you find our code helpful, please cite our paper:

@inproceedings{pak2024textual,
  title={Textual Query-Driven Mask Transformer for Domain Generalized Segmentation},
  author={Pak, Byeonghyun and Woo, Byeongju and Kim, Sunghwan and Kim, Dae-hwan and Kim, Hoseong},
  booktitle={European Conference on Computer Vision},
  pages={37--54},
  year={2024},
  organization={Springer}
}

Acknowledgements

This project is based on the following open-source projects. We thank the authors for sharing their codes.

About

[ECCV'24] Textual Query-Driven Mask Transformer for Domain Generalized Segmentation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages