Vision Transformer Plasticity

Official implementation of the paper Vision Transformer Finetuning Benefits from Non-Smooth Components.
Goal: Investigate the plasticity of the vision transformer components by analyzing their average rates of change.
Findings: Finetuning non-smooth components (with high plasticity) yields better and more stable performance.
Illustration: Non-smooth components allow larger gradient norms and faster descent towards (local) minima.

Abstract

The smoothness of the transformer architecture has been extensively studied in the context of generalization, training stability, and adversarial robustness. However, its role in transfer learning remains poorly understood. In this paper, we analyze the ability of vision transformer components to adapt their outputs to changes in inputs, or, in other words, their plasticity. Defined as an average rate of change, it captures the sensitivity to input perturbation; in particular, a high plasticity implies low smoothness. We demonstrate through theoretical analysis and comprehensive experiments that this perspective provides principled guidance in choosing the components to prioritize during adaptation. A key takeaway for practitioners is that the high plasticity of the attention modules and feedforward layers consistently leads to better finetuning performance. Our findings depart from the prevailing assumption that smoothness is desirable, offering a novel perspective on the functional properties of transformers.

Illustration: The high plasticity of non-smooth components leads to greater finetuning benefits (relative gain).

Overview

Our codebase was tailored to study transformers finetuning; we highly encourage you to use that as a template and modify it however you please to suit your experiments. We tried to make the code as easily modular as possible, so feel free to branch out or fork and play with it. Our codebase is structured as follows:

🛠️ vit-plasticity
┣ 📂apps 
┃ ┣ 📂vit # ViT finetuning and plasticity 
┃ ┃ ┣ 📂configs
┃ ┃ ┣ 📂scripts
┃ ┃ ┣ 📄analysis.py
┃ ┃ ┣ 📄eval.py
┃ ┃ ┣ 📄linear_probing.py
┃ ┃ ┣ 📄train.py
┃ ┃ ┗ 📄utils.py
┃ ┣ 📂plots # Figures
┗ 📂src 
  ┗ 📂vitef # Core library
    ┣ 📂data
    ┣ 📂model
    ┣ 📂monitor
    ┣ 📄__init__.py
    ┣ 📄config.py
    ┣ 📄distributed.py
    ┣ 📄optim.py
    ┗ 📄utils.py

The vitef folder contains essential and generic components related to vision transformers, which can be put together in the apps folder. In particular, apps/vit can be used to reproduce the experiments of our paper.

Getting started

The code runs Python 3.10+. Here are some installation instructions: Install miniforge. Follow the instruction online, most likely you will execute the following commands:

curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh"
bash ~/Miniforge3-latest-Linux-x86_64.sh
source ~/.bashrc

Install Python in a new conda environment (be mindful to install a Python version compatible with Pytorch):

conda create -n myenv python==3.10
conda activate myenv

Install the repository (the ``pretrained" dependencies are optional but allow for a faster download of weights):

git clone <repo url>
cd <repo path>
pip install -e ".[pretrained]"

To install the development and visualization dependencies, you can swap the previous command for the following one:

pip install -e ".[pretrained,dev,visu]"

Accelerate specific instructions

The accelerate package can be used to download and distribute models from the HuggingFace Transformers library. After installing it, one needs to configure it. Follow the instruction online configure-accelerate, most likely you will execute the following command and answer the questions prompted to you:

accelerate config

Launching jobs

We provide below the commands useful to conduct experiments. They must be run from the root of the repository.

Configuration

Most experiments need a configuration file interfaced with the command line. Configuration objects are represented as dataclass objetc. For example, the file your_config.yaml looks like:

log_dir: your_launch
model_name: base
patch_size: 16
dataset_name: cifar10
batch_size: 512
device: cuda:0
seed: 42

It can be used to initialize a dataclass that looks like

@dataclass
class YourConfig:
  log_dir: str = "your_launch"
  model_name: str = "base"
  patch_size: int = 16
  dataset_name: str = "cifar10"
  batch_size: int = 512
  device: str = "cuda:0"
  seed: int = 42

In most scripts (train.py, eval.py, linear_probing.py), we use OmegaConf. The behavior is as follows:

YourConfig is instantiated with its default values,
Those default values are overridden with the ones in your_config.yaml,
We override the result with the additional arguments provided through command line.

Vision transformer plasticity

To compute the plasticity of ViT components on cifar10, run:

python -m apps.vit.analysis run --dataset_name cifar10

Finetuning

To launch a finetuning job on Cifar10, run:

python -m apps.vit.train config=apps/vit/configs/cifar10.yaml

Evaluation

To launch an evaluation job according to eval.yaml, run:

python -m apps.vit.eval config=apps/vit/configs/eval.yaml

Linear probing

To launch a linear probing job according to linear_probing.yaml, run:

python -m apps.vit.linear_probing config=apps/vit/configs/linear_probing.yaml

Reproducibility

The experiments of our paper can be reproduced using the scripts in apps/vit/scripts. Launching them will automatically create dedicated tmux sessions for each group of experiments. The finetuning experiments should be launched before the linear probing experiments since the latter depend on configuration files obtained after the finetuning runs such as the configuration files of finetuned models. After launching those scripts, the linear probing and finetuning performance can be recovered in a folder results/ by running the following command from the root of the repository:

python -m apps.plots.finetuning csv

The figures of our paper can then be reproduced using the files in apps/plots.

Acknowledgements

Our codebase is designed to study the finetuning dynamics and generalization properties of transformers. It draws inspiration from librairies like itl, lingua and pal.

Contact

If you have any questions, feel free to reach out at ambroiseodonnattechnologie@gmail.com.

Citation

If you find our work useful, please consider giving a star ⭐, and citing us as:

@misc{odonnat2026vitplasticity,
      title={Vision Transformer Finetuning Benefits from Non-Smooth Components}, 
      author={Ambroise Odonnat and Laetitia Chapel and Romain Tavenard and Ievgen Redko},
      year={2026},
      eprint={2602.06883},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.06883}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vision Transformer Plasticity

Abstract

Overview

Getting started

Accelerate specific instructions

Launching jobs

Configuration

Vision transformer plasticity

Finetuning

Evaluation

Linear probing

Reproducibility

Acknowledgements

Contact

Citation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Vision Transformer Plasticity

Abstract

Overview

Getting started

Accelerate specific instructions

Launching jobs

Configuration

Vision transformer plasticity

Finetuning

Evaluation

Linear probing

Reproducibility

Acknowledgements

Contact

Citation