The resources related to the trustworthiness of large models (LMs) across multiple dimensions (e.g., safety, security, and privacy), with a special focus on multi-modal LMs (e.g., vision-language models and diffusion models).
-
This repo is in progress 🌱 (manually collected).
-
Badges:
-
🌻 Welcome to recommend resources to us via pulling requests or opening issues with the following format:
| Title | Link | Code | Venue | Classification | Model | Comment |
|---|---|---|---|---|---|---|
| aa | arxiv | github | bb'23 | A1. Jailbreak | LLM | Agent |
- [2025.02.17] We collected
12related papers from NDSS'25! - [2024.08.17] We collected
34related papers from ACL'24! - [2024.05.13] We collected
7related papers from S&P'24! - [2024.04.27] We adjusted the categories.
- [2024.01.20] We collected
3related papers from NDSS'24! - [2024.01.17] We collected
108related papers from ICLR'24! - [2024.01.09] 🚀 LM-SSP is released!
- Book (3)
- Competition (5)
- Leaderboard (3)
- Toolkit (12)
- Survey (36)
- Paper (1615)
- A. Safety (866)
- A0. General (23)
- A1. Jailbreak (381)
- A2. Alignment (91)
- A3. Deepfake (68)
- A4. Ethics (5)
- A5. Fairness (54)
- A6. Hallucination (110)
- A7. Prompt Injection (60)
- A8. Toxicity (74)
- B. Security (268)
- B0. General (13)
- B1. Adversarial Examples (92)
- B2. Agent (26)
- B3. Poison & Backdoor (121)
- B4. System (16)
- C. Privacy (481)
- C0. General (33)
- C1. Contamination (13)
- C2. Copyright (178)
- C3. Data Reconstruction (53)
- C4. Membership Inference Attacks (42)
- C5. Model Extraction (12)
- C6. Privacy-Preserving Computation (93)
- C7. Property Inference Attacks (3)
- C8. Side-Channel (6)
- C9. Unlearning (48)
- A. Safety (866)
-
Organizers: Tianshuo Cong (丛天硕), Xinlei He (何新磊), Zhengyu Zhao (赵正宇), Yugeng Liu (刘禹更), Delong Ran (冉德龙)
-
This project is inspired by LLM Security, Awesome LLM Security, LLM Security & Privacy, UR2-LLMs, PLMpapers, EvaluationPapers4ChatGPT

