[CVPR 2025] Rethinking Token Reduction with Parameter-Efficient Fine-Tuning in ViT for Pixel-Level Tasks
Cheng Lei, Ao Li, Hu Yao, Ce Zhu, Le Zhang,
University of Electronic Science and Technology of China.
Abstract: Parameter-efficient fine-tuning (PEFT) adapts pre-trained models to new tasks by updating only a small subset of parameters, achieving efficiency but still facing significant inference costs driven by input token length. This challenge is even more pronounced in pixel-level tasks, which require longer input sequences compared to image-level tasks. Although token reduction (TR) techniques can help reduce computational demands, they often lead to homogeneous attention patterns that compromise performance in pixel-level scenarios. This study underscores the importance of maintaining attention diversity for these tasks and proposes to enhance attention diversity while ensuring the completeness of token sequences. Our approach effectively reduces the number of tokens processed within transformer blocks, improving computational efficiency without sacrificing performance on several pixel-level tasks. We also demonstrate the superior generalization capability of our proposed method compared to challenging baseline models. The source code will be made available at https://github.com/AVC2-UESTC/DAR-TR-PEFT.
For setup, refer to the Quick Start guide for a fast setup, or follow the detailed instructions below for a step-by-step configuration.
The code requires python>=3.9, as well as pytorch>=2.0.0. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.
Please install MMCV following the instructions here.
Please install xFormers following the instructions here.
Please install the following dependencies:
pip install -r requirements.txt
You can download the pretrained weights dinov2_vitb14_pretrain.pth from DINOv2 or here.
Run the following command to convert the PyTorch weights to the format used in this repository.
python convert_pt_weights.py For training, put the converted weights in the model_weights folder.
| Method | Dataset | Weights | Configs |
|---|---|---|---|
| DAR | DUTS | dinov2_b_dar_duts.pth | config |
| DAR* | DUTS | dinov2_b_dar_distill_duts.pth | |
| DAR | CUHK | dinov2_b_dar_defocus.pth | config |
| DAR* | CUHK | dinov2_b_dar_distill_defocus.pth | |
| DAR | COD10K, CAMO | dinov2_b_dar_cod.pth | config |
| DAR* | COD10K, CAMO | dinov2_b_dar_distill_cod.pth | |
| DAR | Kvasir | dinov2_b_dar_polyp.pth | config |
| DAR* | Kvasir | dinov2_b_dar_distill_polyp.pth | |
| DAR | ISIC2017 | dinov2_b_dar_skin.pth | config |
| DAR* | ISIC2017 | dinov2_b_dar_distill_skin.pth |
For testing, put the pretrained weights and fine-tuned weights in the model_weights folder.
For DAR*, check config_dinov2_b_dar_distill_fgseg_train.py and config_dinov2_b_dar_distill_fgseg_test.py.
The following datasets are used in this paper:
Make sure cuda 11.8 is installed in your virtual environment. Linux is recommmended.
Install pytorch
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu118Install xformers
pip install xformers==0.0.28 --index-url https://download.pytorch.org/whl/cu118
# test installation (optional)
python -m xformers.infoInstall mmcv
pip install mmcv==2.2.0 -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.4/index.htmlOther dependencies
pip install -r requirements.txtWe follow the ADE20K dataset format. Organize your dataset files as follows:
./datasets/dataset_name/
├── images/
│ ├── training/ # Put training images here
│ └── validation/ # Put validation images here
└── annotations/
├── training/ # Put training segmentation maps here
└── validation/ # Put validation segmentation maps here
Put the model weights into the model_weights folder, and run the following command to test the model.
python test.py --config config/path
# or
sh test.sh # for linux
# or
test.bat # for windows
# remember to modify the path in test.sh or test.batPut the pre-trained weights into the model_weights folder, and run the following command to train the model.
python train.py --config config/path
# or
sh train.sh # for linux
# or
train.bat # for windows
# remember to modify the path in test.sh or test.batIf you want to debug the code, ckeck train_debug.py and test_debug.py.
If you find the code helpful in your research or work, please cite the following paper:
@InProceedings{Lei_2025_CVPR,
author = {Lei, Cheng and Li, Ao and Yao, Hu and Zhu, Ce and Zhang, Le},
title = {Rethinking Token Reduction with Parameter-Efficient Fine-Tuning in ViT for Pixel-Level Tasks},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
month = {June},
year = {2025},
pages = {14954-14964}
}
This project is based on MMCV, timm, DINOv2, MAM, and DyT. We thank the authors for their valuable contributions.