[ECCV 2024] Textual Query-Driven Mask Transformer for Domain Generalized Segmentation

Textual Query-Driven Mask Transformer for Domain Generalized Segmentation

Byeonghyun Pak*, Byeongju Woo*, Sunghwan Kim*, Dae-hwan Kim, Hoseong Kim†
Agency for Defense Development
ECCV 2024

[`Project Page`] [`Paper`]

Environment

Requirements

The requirements can be installed with:

conda create -n tqdm python=3.9 numpy=1.26.4
conda activate tqdm
conda install pytorch==2.0.1 torchvision==0.15.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
pip install xformers==0.0.20
pip install mmcv-full==1.5.3

Pre-trained VLM Models

Please download the pre-trained CLIP and EVA02-CLIP and save them in ./pretrained folder.

Model Type Link

CLIP ViT-B-16.pt official repo

EVA02-CLIP EVA02_CLIP_L_336_psz14_s6B official repo

Checkpoints

You can download tqdm model checkpoints:

Model	Pretrained	Trained on	Config	Link
`tqdm-clip-vit-b-gta`	CLIP	GTA5	config	download link
`tqdm-eva02-clip-vit-l-gta`	EVA02-CLIP	GTA5	config	download link
`tqdm-eva02-clip-vit-l-city`	EVA02-CLIP	Cityscapes	config	download link

Datasets

To set up datasets, please follow the official TLDR repo.

After downloading the datasets, edit the data folder root in the dataset config files following your environment.

src_dataset_dict = dict(..., data_root='[YOUR_DATA_FOLDER_ROOT]', ...)
tgt_dataset_dict = dict(..., data_root='[YOUR_DATA_FOLDER_ROOT]', ...)

Train

bash dist_train.sh configs/[TRAIN_CONFIG] [NUM_GPUs]

[TRAIN_CONFIG]: Train configuration file (e.g., tqdm/tqdm_eve_vit-l_1e-5_20k-g2c-512.py)
[NUM_GPUs]: Number of GPUs used for training

Test

To enable multi-scale flip augmentation during testing, use the --aug-test option.

Note: The experiment results in our main paper were obtained without multi-scale flip augmentation.

bash dist_test.sh configs/[TEST_CONFIG] work_dirs/[MODEL] [NUM_GPUs] --eval mIoU

[TRAIN_CONFIG]: Test configuration file (e.g., tqdm/tqdm_eve_vit-l_1e-5_20k-g2b-512.py)
[MODEL]: Model checkpoint (e.g., tqdm_eve_vit-l_1e-5_20k-g2c-512/epoch_last.pth)
[NUM_GPUs]: Number of GPUs used for testing

The Most Relevant Files

configs/tqdm/* - Config files for the final tqdm
models/segmentors/* - Overall tqdm framework
mmseg/models/utils/assigner.py - Implementation of fixed matching
mmseg/models/decode_heads/tqdm_head.py - Our textual object query-based segmentation head
mmseg/models/plugins/tqdm_msdeformattn_pixel_decoder.py - Our pixel decoder with text-to-pixel attention

Citation

If you find our code helpful, please cite our paper:

@inproceedings{pak2024textual,
  title={Textual Query-Driven Mask Transformer for Domain Generalized Segmentation},
  author={Pak, Byeonghyun and Woo, Byeongju and Kim, Sunghwan and Kim, Dae-hwan and Kim, Hoseong},
  booktitle={European Conference on Computer Vision},
  pages={37--54},
  year={2024},
  organization={Springer}
}

Acknowledgements

This project is based on the following open-source projects. We thank the authors for sharing their codes.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
configs		configs
mmseg		mmseg
models		models
tools/convert_datasets		tools/convert_datasets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dist_test.sh		dist_test.sh
dist_train.sh		dist_train.sh
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ECCV 2024] Textual Query-Driven Mask Transformer for Domain Generalized Segmentation

Textual Query-Driven Mask Transformer for Domain Generalized Segmentation

[`Project Page`] [`Paper`]

Environment

Requirements

Pre-trained VLM Models

Checkpoints

Datasets

Train

Test

The Most Relevant Files

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Model	Type	Link
CLIP	`ViT-B-16.pt`	official repo
EVA02-CLIP	`EVA02_CLIP_L_336_psz14_s6B`	official repo

Folders and files

Latest commit

History

Repository files navigation

[ECCV 2024] Textual Query-Driven Mask Transformer for Domain Generalized Segmentation

Textual Query-Driven Mask Transformer for Domain Generalized Segmentation

[Project Page] [Paper]

Environment

Requirements

Pre-trained VLM Models

Checkpoints

Datasets

Train

Test

The Most Relevant Files

Citation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

[`Project Page`] [`Paper`]

Packages