Skip to content

jinxxo-j/DAR-TR-PEFT

 
 

Repository files navigation

[CVPR 2025] Rethinking Token Reduction with Parameter-Efficient Fine-Tuning in ViT for Pixel-Level Tasks


Cheng Lei,   Ao Li,   Hu Yao,   Ce Zhu,   Le Zhang,  
University of Electronic Science and Technology of China.  

Paper  

Abstract: Parameter-efficient fine-tuning (PEFT) adapts pre-trained models to new tasks by updating only a small subset of parameters, achieving efficiency but still facing significant inference costs driven by input token length. This challenge is even more pronounced in pixel-level tasks, which require longer input sequences compared to image-level tasks. Although token reduction (TR) techniques can help reduce computational demands, they often lead to homogeneous attention patterns that compromise performance in pixel-level scenarios. This study underscores the importance of maintaining attention diversity for these tasks and proposes to enhance attention diversity while ensuring the completeness of token sequences. Our approach effectively reduces the number of tokens processed within transformer blocks, improving computational efficiency without sacrificing performance on several pixel-level tasks. We also demonstrate the superior generalization capability of our proposed method compared to challenging baseline models. The source code will be made available at https://github.com/AVC2-UESTC/DAR-TR-PEFT.

Install

For setup, refer to the Quick Start guide for a fast setup, or follow the detailed instructions below for a step-by-step configuration.

Pytorch

The code requires python>=3.9, as well as pytorch>=2.0.0. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.

MMCV

Please install MMCV following the instructions here.

xFormers

Please install xFormers following the instructions here.

Other Dependencies

Please install the following dependencies:

pip install -r requirements.txt

Model Weights

Pretrained Weights

You can download the pretrained weights dinov2_vitb14_pretrain.pth from DINOv2 or here.

Run the following command to convert the PyTorch weights to the format used in this repository.

python convert_pt_weights.py 

For training, put the converted weights in the model_weights folder.

Fine-tuned Weights

Method Dataset Weights Configs
DAR DUTS dinov2_b_dar_duts.pth config
DAR* DUTS dinov2_b_dar_distill_duts.pth
DAR CUHK dinov2_b_dar_defocus.pth config
DAR* CUHK dinov2_b_dar_distill_defocus.pth
DAR COD10K, CAMO dinov2_b_dar_cod.pth config
DAR* COD10K, CAMO dinov2_b_dar_distill_cod.pth
DAR Kvasir dinov2_b_dar_polyp.pth config
DAR* Kvasir dinov2_b_dar_distill_polyp.pth
DAR ISIC2017 dinov2_b_dar_skin.pth config
DAR* ISIC2017 dinov2_b_dar_distill_skin.pth

For testing, put the pretrained weights and fine-tuned weights in the model_weights folder.

For DAR*, check config_dinov2_b_dar_distill_fgseg_train.py and config_dinov2_b_dar_distill_fgseg_test.py.


Dataset

The following datasets are used in this paper:


Quick Start

Environment Setup

Make sure cuda 11.8 is installed in your virtual environment. Linux is recommmended.

Install pytorch

pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu118

Install xformers

pip install xformers==0.0.28 --index-url https://download.pytorch.org/whl/cu118

# test installation (optional)
python -m xformers.info

Install mmcv

pip install mmcv==2.2.0 -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.4/index.html

Other dependencies

pip install -r requirements.txt

Prepare Dataset

We follow the ADE20K dataset format. Organize your dataset files as follows:

./datasets/dataset_name/

├── images/
│   ├── training/       # Put training images here
│   └── validation/     # Put validation images here
└── annotations/
    ├── training/       # Put training segmentation maps here 
    └── validation/     # Put validation segmentation maps here 

Test

Put the model weights into the model_weights folder, and run the following command to test the model.

python test.py --config config/path
# or
sh test.sh # for linux
# or
test.bat # for windows
# remember to modify the path in test.sh or test.bat

Train

Put the pre-trained weights into the model_weights folder, and run the following command to train the model.

python train.py --config config/path
# or
sh train.sh # for linux
# or
train.bat # for windows
# remember to modify the path in test.sh or test.bat

Debug

If you want to debug the code, ckeck train_debug.py and test_debug.py.


Citation

If you find the code helpful in your research or work, please cite the following paper:

@InProceedings{Lei_2025_CVPR,
    author    = {Lei, Cheng and Li, Ao and Yao, Hu and Zhu, Ce and Zhang, Le},
    title     = {Rethinking Token Reduction with Parameter-Efficient Fine-Tuning in ViT for Pixel-Level Tasks},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {14954-14964}
}

Acknowledgement

This project is based on MMCV, timm, DINOv2, MAM, and DyT. We thank the authors for their valuable contributions.

About

[CVPR2025] Rethinking Token Reduction with Parameter-Efficient Fine-Tuning in ViT for Pixel-Level Tasks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%