[CVPR 2025] Rethinking Token Reduction with Parameter-Efficient Fine-Tuning in ViT for Pixel-Level Tasks

Cheng Lei, Ao Li, Hu Yao, Ce Zhu, Le Zhang,
University of Electronic Science and Technology of China.

Paper

Abstract: Parameter-efficient fine-tuning (PEFT) adapts pre-trained models to new tasks by updating only a small subset of parameters, achieving efficiency but still facing significant inference costs driven by input token length. This challenge is even more pronounced in pixel-level tasks, which require longer input sequences compared to image-level tasks. Although token reduction (TR) techniques can help reduce computational demands, they often lead to homogeneous attention patterns that compromise performance in pixel-level scenarios. This study underscores the importance of maintaining attention diversity for these tasks and proposes to enhance attention diversity while ensuring the completeness of token sequences. Our approach effectively reduces the number of tokens processed within transformer blocks, improving computational efficiency without sacrificing performance on several pixel-level tasks. We also demonstrate the superior generalization capability of our proposed method compared to challenging baseline models. The source code will be made available at https://github.com/AVC2-UESTC/DAR-TR-PEFT.

Install

For setup, refer to the Quick Start guide for a fast setup, or follow the detailed instructions below for a step-by-step configuration.

Pytorch

The code requires python>=3.9, as well as pytorch>=2.0.0. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.

MMCV

Please install MMCV following the instructions here.

xFormers

Please install xFormers following the instructions here.

Other Dependencies

Please install the following dependencies:

pip install -r requirements.txt

Model Weights

Pretrained Weights

You can download the pretrained weights dinov2_vitb14_pretrain.pth from DINOv2 or here.

Run the following command to convert the PyTorch weights to the format used in this repository.

python convert_pt_weights.py

For training, put the converted weights in the model_weights folder.

Fine-tuned Weights

Method	Dataset	Weights	Configs
DAR	DUTS	dinov2_b_dar_duts.pth	config
DAR*	DUTS	dinov2_b_dar_distill_duts.pth
DAR	CUHK	dinov2_b_dar_defocus.pth	config
DAR*	CUHK	dinov2_b_dar_distill_defocus.pth
DAR	COD10K, CAMO	dinov2_b_dar_cod.pth	config
DAR*	COD10K, CAMO	dinov2_b_dar_distill_cod.pth
DAR	Kvasir	dinov2_b_dar_polyp.pth	config
DAR*	Kvasir	dinov2_b_dar_distill_polyp.pth
DAR	ISIC2017	dinov2_b_dar_skin.pth	config
DAR*	ISIC2017	dinov2_b_dar_distill_skin.pth

For testing, put the pretrained weights and fine-tuned weights in the model_weights folder.

For DAR*, check config_dinov2_b_dar_distill_fgseg_train.py and config_dinov2_b_dar_distill_fgseg_test.py.

Dataset

The following datasets are used in this paper:

Quick Start

Environment Setup

Make sure cuda 11.8 is installed in your virtual environment. Linux is recommmended.

Install pytorch

pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu118

Install xformers

pip install xformers==0.0.28 --index-url https://download.pytorch.org/whl/cu118

# test installation (optional)
python -m xformers.info

Install mmcv

pip install mmcv==2.2.0 -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.4/index.html

Other dependencies

pip install -r requirements.txt

Prepare Dataset

We follow the ADE20K dataset format. Organize your dataset files as follows:

./datasets/dataset_name/

├── images/
│   ├── training/       # Put training images here
│   └── validation/     # Put validation images here
└── annotations/
    ├── training/       # Put training segmentation maps here 
    └── validation/     # Put validation segmentation maps here

Test

Put the model weights into the model_weights folder, and run the following command to test the model.

python test.py --config config/path
# or
sh test.sh # for linux
# or
test.bat # for windows
# remember to modify the path in test.sh or test.bat

Train

Put the pre-trained weights into the model_weights folder, and run the following command to train the model.

python train.py --config config/path
# or
sh train.sh # for linux
# or
train.bat # for windows
# remember to modify the path in test.sh or test.bat

Debug

If you want to debug the code, ckeck train_debug.py and test_debug.py.

Citation

If you find the code helpful in your research or work, please cite the following paper:

@InProceedings{Lei_2025_CVPR,
    author    = {Lei, Cheng and Li, Ao and Yao, Hu and Zhu, Ce and Zhang, Le},
    title     = {Rethinking Token Reduction with Parameter-Efficient Fine-Tuning in ViT for Pixel-Level Tasks},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {14954-14964}
}

Acknowledgement

This project is based on MMCV, timm, DINOv2, MAM, and DyT. We thank the authors for their valuable contributions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[CVPR 2025] Rethinking Token Reduction with Parameter-Efficient Fine-Tuning in ViT for Pixel-Level Tasks

Paper

Install

Pytorch

MMCV

xFormers

Other Dependencies

Model Weights

Pretrained Weights

Fine-tuned Weights

Dataset

Quick Start

Environment Setup

Prepare Dataset

Test

Train

Debug

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
configs		configs
src		src
train_utils		train_utils
convert_pt_weights.py		convert_pt_weights.py
readme.md		readme.md
requirements.txt		requirements.txt
test.bat		test.bat
test.py		test.py
test.sh		test.sh
test_debug.py		test_debug.py
train.bat		train.bat
train.py		train.py
train.sh		train.sh
train_debug.py		train_debug.py

Folders and files

Latest commit

History

Repository files navigation

[CVPR 2025] Rethinking Token Reduction with Parameter-Efficient Fine-Tuning in ViT for Pixel-Level Tasks

Paper

Install

Pytorch

MMCV

xFormers

Other Dependencies

Model Weights

Pretrained Weights

Fine-tuned Weights

Dataset

Quick Start

Environment Setup

Prepare Dataset

Test

Train

Debug

Citation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages