Documentation • Features & Uses • Usage Examples • Attack Models • Toolkit Design
ATBA: Transferring Backdoors between Large Language Models by Knowledge Distillation
Contribution:
-
We propose ATBA, the first adaptive and transferable backdoor attack for LLMs, which aims to reveal the vulnerability of LLMs when using knowledge distillation.
-
We design a target trigger generation module that leverages cosine similarity distribution to filter out indicative triggers from the original vocabulary tables of the teacher LLMs. This approach not only effectively realizes implicit backdoor transferable but also reduces search complexity.
-
We introduce an adaptive trigger optimization module based on KD simulation and dynamic greedy searching, which overcomes textual discretization and is more robust than traditional triggers.
-
Extensive experiments show that ATBA is highly transferable and successfully activates against student models with different architectures on five popular tasks.
How to Running ATBA
1. Environment
pip install -r reuirement.txt2. Download Dataset from HuggingFace
from datasets improt load_dataset
dataset = load_dataset("dataset path")
dataset.save_to_disk("./dataset/")3. Download Models from HuggingFace
model.save_pretrained("/home/models/")4. Warmup
Warm up the model using the warmup.ipynb script in the ATO module
5. TTG
Modify the model and dataset paths under run/TTG_xxx.sh and run the corresponding script to obtain the target trigger word candidates.
bash ./run/TTG_xxx.sh6. ATO
Modify the model and dataset paths under run/ATO_xxx.sh and run the corresponding script to get the optimal trigger word.
bash ./run/ATO_xxx.sh7. Evaluation
Modify the model and dataset paths under run/KD_xxx.sh and run the corresponding script to evaluate the backdoor transfer capability of the teacher model on the three student models.
bash ./run/KD_xxx.shPlease cite our paper if you use this toolkit:
@article{cheng2024transferring,
title={Transferring backdoors between large language models by knowledge distillation},
author={Cheng, Pengzhou and Wu, Zongru and Ju, Tianjie and Du, Wei and Liu, Zhuosheng Zhang Gongshen},
journal={arXiv preprint arXiv:2408.09878},
year={2024}
}
We thank all the contributors to this project. And more contributions are very welcome.
