Shaohan Yu1,2,3,*, Lijun Li1,*†, Chenyang Si2 Lu Sheng3 Jing Shao1,†
1Shanghai Artificial Intelligence Laboratory, 2PRLab, Nanjing University, 3Beihang University
*Equal Contribution, †Corresponding Author
The rapid evolution of generative models has led to a continuous emergence of multimodal safety risks, exposing the limitations of existing defense methods. To address these challenges, we propose ProGuard, a vision-language proactive guard that identifies and describes out-of-distribution (OOD) safety risks without the need for model adjustments required by traditional reactive approaches. We first construct a modality-balanced dataset of 87K samples, each annotated with both binary safety labels and risk categories under a hierarchical multimodal safety taxonomy, effectively mitigating modality bias and ensuring consistent moderation across text, image, and text-image inputs. Based on this dataset, we train our vision-language base model purely through reinforcement learning (RL) to achieve efficient and concise reasoning. To approximate proactive safety scenarios in a controlled setting, we further introduce an OOD safety category inference task and augment the RL objective with a synonym-bank-based similarity reward that encourages the model to generate concise descriptions for unseen unsafe categories. Experimental results show that ProGuard achieves performance comparable to closed-source large models on binary safety classification, substantially outperforms existing open-source guard models on unsafe content categorization. Most notably, ProGuard delivers a strong proactive moderation ability, improving OOD risk detection by 52.6% and OOD risk description by 64.8%.
- [2025.12.29] release our code repo & project page 🎉.
- [2026.02.06] Thanks to DeepSafe for using our model. Please check out their awesome work 🙌.
- Training: First, download the training dataset from Hugging Face, preprocess it into the format compatible with verl, modify the paths in the scripts under the
traindirectory to your local paths, and runproguard-train.shto start training. - Deployment: Refer to
deploy/README.mdfor model usage instructions.
Our method are partly based on verl, Qwen-VL series and BGE-M3. Thanks for their awesome works.
@article{yu2025proguard,
title={ProGuard: Towards Proactive Multimodal Safeguard},
author={Yu, Shaohan and Li, Lijun and Si, Chenyang and Sheng, Lu and Shao, Jing},
journal={arXiv preprint arXiv:2512.23573},
year={2025},
url={https://yushaohan.github.io/ProGuard/}
}
