GitHub - Zhou-CyberSecurity-AI/ATBA

Documentation • Features & Uses • Usage Examples • Attack Models • Toolkit Design

ATBA: Transferring Backdoors between Large Language Models by Knowledge Distillation

Contribution:

We propose ATBA, the first adaptive and transferable backdoor attack for LLMs, which aims to reveal the vulnerability of LLMs when using knowledge distillation.
We design a target trigger generation module that leverages cosine similarity distribution to filter out indicative triggers from the original vocabulary tables of the teacher LLMs. This approach not only effectively realizes implicit backdoor transferable but also reduces search complexity.
We introduce an adaptive trigger optimization module based on KD simulation and dynamic greedy searching, which overcomes textual discretization and is more robust than traditional triggers.
Extensive experiments show that ATBA is highly transferable and successfully activates against student models with different architectures on five popular tasks.

How to Running ATBA

1. Environment

pip install -r reuirement.txt

2. Download Dataset from HuggingFace

from datasets improt load_dataset
dataset = load_dataset("dataset path")
dataset.save_to_disk("./dataset/")

3. Download Models from HuggingFace

model.save_pretrained("/home/models/")

4. Warmup

Warm up the model using the warmup.ipynb script in the ATO module

5. TTG

Modify the model and dataset paths under run/TTG_xxx.sh and run the corresponding script to obtain the target trigger word candidates.

bash ./run/TTG_xxx.sh

6. ATO

Modify the model and dataset paths under run/ATO_xxx.sh and run the corresponding script to get the optimal trigger word.

bash ./run/ATO_xxx.sh

7. Evaluation

Modify the model and dataset paths under run/KD_xxx.sh and run the corresponding script to evaluate the backdoor transfer capability of the teacher model on the three student models.

bash ./run/KD_xxx.sh

Attack Models

Citation

Please cite our paper if you use this toolkit:

@article{cheng2024transferring,
  title={Transferring backdoors between large language models by knowledge distillation},
  author={Cheng, Pengzhou and Wu, Zongru and Ju, Tianjie and Du, Wei and Liu, Zhuosheng Zhang Gongshen},
  journal={arXiv preprint arXiv:2408.09878},
  year={2024}
}

Contributors

We thank all the contributors to this project. And more contributions are very welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
ATO		ATO
Dataset		Dataset
KD		KD
Result		Result
TTG		TTG
Utils		Utils
Visulization		Visulization
docs/images		docs/images
run		run
.DS_Store		.DS_Store
README.md		README.md
pipeline.png		pipeline.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Attack Models

Citation

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Zhou-CyberSecurity-AI/ATBA

Folders and files

Latest commit

History

Repository files navigation

Attack Models

Citation

Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages