Jincheng Wei
2026
USB: A COMPREHENSIVE AND UNIFIED SAFETY EVALUATION BENCHMARK FOR MULTIMODAL LARGE LANGUAGE MODELS
Baolin Zheng | Guanlin Chen | Qingyang Teng | Hongqiong Zhong | Yingshui Tan | Zhendong Liu | Weixun Wang | Jiaheng Liu | Jian Yang | Huiyun Jing | Jincheng Wei | Wenbo Su | Xiaoyong Zhu | Bo Zheng | Kaifu Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Baolin Zheng | Guanlin Chen | Qingyang Teng | Hongqiong Zhong | Yingshui Tan | Zhendong Liu | Weixun Wang | Jiaheng Liu | Jian Yang | Huiyun Jing | Jincheng Wei | Wenbo Su | Xiaoyong Zhu | Bo Zheng | Kaifu Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Despite their rapid advancement, Multimodal Large Language Models (MLLMs) remain vulnerable to diverse safety risks. Current benchmarks fail to provide reliable assessments due to limited risk coverage, insufficient scale, and the oversight of complex modality combinations (e.g., cross-modal risks). To address this, we introduce the Unified Safety Benchmark (USB), a comprehensive framework covering 61 risk categories across four distinct modality interactions. We first demonstrate that existing benchmarks—even when aggregated—leave significant coverage gaps. To bridge this, we design a sophisticated data synthesis pipeline that generates complementary data, ensuring balanced coverage across all risk dimensions. Furthermore, beyond evaluating vulnerability to harmful queries, USB incorporates the simultaneous assessment of model over-refusal on benign inputs as an integrated diagnostic suite. Experimental results, evaluating 22 MLLMs across 244 risk-modality intersections, demonstrate that existing MLLMs still struggle with the trade-off between avoiding vulnerabilities and over-refusal. Models are particularly vulnerable to image-only or cross-modal risky inputs, highlighting the persistent need for refined safety mechanisms. Warning: This paper contains unfiltered and potentially harmful content that may be offensive.
2025
Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
Yingshui Tan | Boren Zheng | Baihui Zheng | Kerui Cao | Huiyun Jing | Jincheng Wei | Jiaheng Liu | Yancheng He | Wenbo Su | Xiaoyong Zhu | Bo Zheng | Kaifu Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yingshui Tan | Boren Zheng | Baihui Zheng | Kerui Cao | Huiyun Jing | Jincheng Wei | Jiaheng Liu | Yancheng He | Wenbo Su | Xiaoyong Zhu | Bo Zheng | Kaifu Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
With the rapid advancement of Large Language Models (LLMs), significant safety concerns have emerged. Fundamentally, the safety of large language models is closely linked to the accuracy, comprehensiveness, and clarity of their understanding of safety knowledge, particularly in domains such as law, policy and ethics. This factuality ability is crucial in determining whether these models can be deployed and applied safely and compliantly within specific regions. To address these challenges and better evaluate the factuality ability of LLMs to answer short question, we introduce the Chinese SafetyQA benchmark. Chinese SafetyQA has several properties (i.e., Chinese, Diverse, High-quality, Static, Easy-to-evaluate, safety-related, harmless). Based on Chinese SafetyQA, we perform a comprehensive evaluation on the factuality abilities of existing LLMs and analyze how these capabilities relate to LLM abilities, e.g., RAG ability and robustness against attacks.