ProGuard: Towards Proactive Multimodal Safeguard

Shaohan Yu^1,2,3,*, Lijun Li^1,*†, Chenyang Si² Lu Sheng³ Jing Shao^1,†

¹Shanghai Artificial Intelligence Laboratory, ²PRLab, Nanjing University, ³Beihang University

^*Equal Contribution, ^†Corresponding Author

📄 Paper | 🌐 Project Page | 🤗 Models & Dataset

📝 Abstract

The rapid evolution of generative models has led to a continuous emergence of multimodal safety risks, exposing the limitations of existing defense methods. To address these challenges, we propose ProGuard, a vision-language proactive guard that identifies and describes out-of-distribution (OOD) safety risks without the need for model adjustments required by traditional reactive approaches. We first construct a modality-balanced dataset of 87K samples, each annotated with both binary safety labels and risk categories under a hierarchical multimodal safety taxonomy, effectively mitigating modality bias and ensuring consistent moderation across text, image, and text-image inputs. Based on this dataset, we train our vision-language base model purely through reinforcement learning (RL) to achieve efficient and concise reasoning. To approximate proactive safety scenarios in a controlled setting, we further introduce an OOD safety category inference task and augment the RL objective with a synonym-bank-based similarity reward that encourages the model to generate concise descriptions for unseen unsafe categories. Experimental results show that ProGuard achieves performance comparable to closed-source large models on binary safety classification, substantially outperforms existing open-source guard models on unsafe content categorization. Most notably, ProGuard delivers a strong proactive moderation ability, improving OOD risk detection by 52.6% and OOD risk description by 64.8%.

📣 News

[2025.12.29] release our code repo & project page 🎉.
[2026.02.06] Thanks to DeepSafe for using our model. Please check out their awesome work 🙌.

🔧 Usage

Training: First, download the training dataset from Hugging Face, preprocess it into the format compatible with verl, modify the paths in the scripts under the train directory to your local paths, and run proguard-train.sh to start training.
Deployment: Refer to deploy/README.md for model usage instructions.

🙏 Acknowledgments

Our method are partly based on verl, Qwen-VL series and BGE-M3. Thanks for their awesome works.

📖 Citation

@article{yu2025proguard,
  title={ProGuard: Towards Proactive Multimodal Safeguard},
  author={Yu, Shaohan and Li, Lijun and Si, Chenyang and Sheng, Lu and Shao, Jing},
  journal={arXiv preprint arXiv:2512.23573},
  year={2025},
  url={https://yushaohan.github.io/ProGuard/}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
deploy		deploy
static		static
train		train
.nojekyll		.nojekyll
README.md		README.md
index.html		index.html
taxonomy.yaml		taxonomy.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProGuard: Towards Proactive Multimodal Safeguard

📝 Abstract

📣 News

🔧 Usage

🙏 Acknowledgments

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ProGuard: Towards Proactive Multimodal Safeguard

📝 Abstract

📣 News

🔧 Usage

🙏 Acknowledgments

📖 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages