Xiaoyong Zhu
2026
USB: A COMPREHENSIVE AND UNIFIED SAFETY EVALUATION BENCHMARK FOR MULTIMODAL LARGE LANGUAGE MODELS
Baolin Zheng | Guanlin Chen | Qingyang Teng | Hongqiong Zhong | Yingshui Tan | Zhendong Liu | Weixun Wang | Jiaheng Liu | Jian Yang | Huiyun Jing | Jincheng Wei | Wenbo Su | Xiaoyong Zhu | Bo Zheng | Kaifu Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Baolin Zheng | Guanlin Chen | Qingyang Teng | Hongqiong Zhong | Yingshui Tan | Zhendong Liu | Weixun Wang | Jiaheng Liu | Jian Yang | Huiyun Jing | Jincheng Wei | Wenbo Su | Xiaoyong Zhu | Bo Zheng | Kaifu Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Despite their rapid advancement, Multimodal Large Language Models (MLLMs) remain vulnerable to diverse safety risks. Current benchmarks fail to provide reliable assessments due to limited risk coverage, insufficient scale, and the oversight of complex modality combinations (e.g., cross-modal risks). To address this, we introduce the Unified Safety Benchmark (USB), a comprehensive framework covering 61 risk categories across four distinct modality interactions. We first demonstrate that existing benchmarks—even when aggregated—leave significant coverage gaps. To bridge this, we design a sophisticated data synthesis pipeline that generates complementary data, ensuring balanced coverage across all risk dimensions. Furthermore, beyond evaluating vulnerability to harmful queries, USB incorporates the simultaneous assessment of model over-refusal on benign inputs as an integrated diagnostic suite. Experimental results, evaluating 22 MLLMs across 244 risk-modality intersections, demonstrate that existing MLLMs still struggle with the trade-off between avoiding vulnerabilities and over-refusal. Models are particularly vulnerable to image-only or cross-modal risky inputs, highlighting the persistent need for refined safety mechanisms. Warning: This paper contains unfiltered and potentially harmful content that may be offensive.
Enabling Agents to Communicate Entirely in Latent Space
Zhuoyun Du | Runze Wang | Huiyu Bai | Zouying Cao | Xiaoyong Zhu | Yu Cheng | Bo Zheng | Wei Chen | Haochao Ying
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhuoyun Du | Runze Wang | Huiyu Bai | Zouying Cao | Xiaoyong Zhu | Yu Cheng | Bo Zheng | Wei Chen | Haochao Ying
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While natural language is the de facto communication medium for LLM-based agents, it presents a fundamental constraint. The process of downsampling rich, internal latent states into discrete tokens inherently limits the depth and nuance of information that can be transmitted, thereby hindering collaborative problem-solving. Inspired by telepathy, which bypasses symbolic language in communication, we propose Interlat (Inter-agent Latent Space Communication), a paradigm that leverages the continuous last hidden states of an LLM as a representation of its thought for direct communication (termed "latent communication"). An additional learned compression process further compresses latent communication via latent space reasoning. Experiments demonstrate that Interlat outperforms both fine-tuned chain-of-thought (CoT) prompting and single-agent baselines, even across heterogeneous models, promoting more exploratory behavior and enabling genuine utilization of latent information. Further compression not only substantially accelerates inference by up to 24× but also maintains competitive performance through an efficient information-preserving mechanism. We position this work as a feasibility study of entirely latent space inter-agent communication, and our results highlight its potential, offering valuable insights for future research.
2025
See the World, Discover Knowledge: A Chinese Factuality Evaluation for Large Vision Language Models
Jihao Gu | Yingyao Wang | Pi Bu | Chen Wang | Ziming Wang | Tengtao Song | Donglai Wei | Jiale Yuan | Yingxiu Zhao | Yancheng He | Shilong Li | Jiaheng Liu | Meng Cao | Jun Song | Yingshui Tan | Xiang Li | Wenbo Su | Xiaoyong Zhu | Bo Zheng
Findings of the Association for Computational Linguistics: ACL 2025
Jihao Gu | Yingyao Wang | Pi Bu | Chen Wang | Ziming Wang | Tengtao Song | Donglai Wei | Jiale Yuan | Yingxiu Zhao | Yancheng He | Shilong Li | Jiaheng Liu | Meng Cao | Jun Song | Yingshui Tan | Xiang Li | Wenbo Su | Xiaoyong Zhu | Bo Zheng
Findings of the Association for Computational Linguistics: ACL 2025
The evaluation of factual accuracy in large vision language models (LVLMs) has lagged behind their rapid development, making it challenging to fully reflect these models’ knowledge capacity and reliability. In this paper, we introduce the first factuality-based visual question-answering benchmark in Chinese, named ChineseSimpleVQA, aimed at assessing the visual factuality of LVLMs across 8 major topics and 56 subtopics. The key features of this benchmark include a focus on the Chinese language, diverse knowledge types, a multi-hop question construction, high-quality data, static consistency, and easy-to-evaluate through short answers. Moreover, we contribute a rigorous data construction pipeline and decouple the visual factuality into two parts: seeing the world (i.e., object recognition) and discovering knowledge. This decoupling allows us to analyze the capability boundaries and execution mechanisms of LVLMs. Subsequently, we evaluate 34 advanced open-source and closed-source models, revealing critical performance gaps within this field.
PGPO: Enhancing Agent Reasoning via Pseudocode-style Planning Guided Preference Optimization
Zouying Cao | Runze Wang | Yifei Yang | Xinbei Ma | Xiaoyong Zhu | Bo Zheng | Hai Zhao
Findings of the Association for Computational Linguistics: ACL 2025
Zouying Cao | Runze Wang | Yifei Yang | Xinbei Ma | Xiaoyong Zhu | Bo Zheng | Hai Zhao
Findings of the Association for Computational Linguistics: ACL 2025
Large Language Model (LLM) agents have demonstrated impressive capabilities in handling complex interactive problems. Existing LLM agents mainly generate natural language plans to guide reasoning, which is verbose and inefficient. NL plans are also tailored to specific tasks and restrict agents’ ability to generalize across similar tasks. To this end, we explore pseudocode-style plans (P-code Plan) to capture the structural logic of reasoning. We find that P-code Plan empowers LLM agents with stronger generalization ability and more efficiency. Inspired by this finding, we propose a pseudocode-style ̲Planning ̲Guided ̲Preference ̲Optimization method called PGPO for effective agent learning. With two planning-oriented rewards, PGPO further enhances LLM agents’ ability to generate high-quality P-code Plans and subsequent reasoning. Experiments show that PGPO achieves superior performance on representative agent benchmarks and outperforms the current leading baselines. Analyses reveal the advantage of PGPO in reducing action errors and omissions during reasoning.
Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
Yingshui Tan | Boren Zheng | Baihui Zheng | Kerui Cao | Huiyun Jing | Jincheng Wei | Jiaheng Liu | Yancheng He | Wenbo Su | Xiaoyong Zhu | Bo Zheng | Kaifu Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yingshui Tan | Boren Zheng | Baihui Zheng | Kerui Cao | Huiyun Jing | Jincheng Wei | Jiaheng Liu | Yancheng He | Wenbo Su | Xiaoyong Zhu | Bo Zheng | Kaifu Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
With the rapid advancement of Large Language Models (LLMs), significant safety concerns have emerged. Fundamentally, the safety of large language models is closely linked to the accuracy, comprehensiveness, and clarity of their understanding of safety knowledge, particularly in domains such as law, policy and ethics. This factuality ability is crucial in determining whether these models can be deployed and applied safely and compliantly within specific regions. To address these challenges and better evaluate the factuality ability of LLMs to answer short question, we introduce the Chinese SafetyQA benchmark. Chinese SafetyQA has several properties (i.e., Chinese, Diverse, High-quality, Static, Easy-to-evaluate, safety-related, harmless). Based on Chinese SafetyQA, we perform a comprehensive evaluation on the factuality abilities of existing LLMs and analyze how these capabilities relate to LLM abilities, e.g., RAG ability and robustness against attacks.
HiddenDetect: Detecting Jailbreak Attacks against Multimodal Large Language Models via Monitoring Hidden States
Yilei Jiang | Xinyan Gao | Tianshuo Peng | Yingshui Tan | Xiaoyong Zhu | Bo Zheng | Xiangyu Yue
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yilei Jiang | Xinyan Gao | Tianshuo Peng | Yingshui Tan | Xiaoyong Zhu | Bo Zheng | Xiangyu Yue
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The integration of additional modalities increases the susceptibility of large vision-language models (LVLMs) to safety risks, such as jailbreak attacks, compared to their language-only counterparts. While existing research primarily focuses on post-hoc alignment techniques, the underlying safety mechanisms within LVLMs remain largely unexplored. In this work , we investigate whether LVLMs inherently encode safety-relevant signals within their internal activations during inference. Our findings reveal that LVLMs exhibit distinct activation patterns when processing unsafe prompts, which can be leveraged to detect and mitigate adversarial inputs without requiring extensive fine-tuning. Building on this insight, we introduce HiddenDetect, a novel tuning-free framework that harnesses internal model activations to enhance safety. Experimental results show that HiddenDetect surpasses state-of-the-art methods in detecting jailbreak attacks against LVLMs. By utilizing intrinsic safety-aware patterns, our method provides an efficient and scalable solution for strengthening LVLM robustness against multimodal threats. Our code and data will be released publicly.
Search
Fix author
Co-authors
- Bo Zheng 6
- Yingshui Tan 4
- Jiaheng Liu 3
- Wenbo Su 3
- Zouying Cao 2
- Yancheng He 2
- Huiyun Jing 2
- Runze Wang 2
- Jincheng Wei 2
- Kaifu Zhang 2
- Huiyu Bai 1
- Pi Bu 1
- Kerui Cao 1
- Meng Cao 1
- Guanlin Chen 1
- Wei Chen 1
- Yu Cheng 1
- Zhuoyun Du 1
- Xinyan Gao 1
- Jihao Gu 1
- Yilei Jiang 1
- Shilong Li 1
- Xiang Li 1
- Zhendong Liu 1
- Xinbei Ma 1
- Tianshuo Peng 1
- Jun Song 1
- Tengtao Song 1
- Qingyang Teng 1
- Chen Wang 1
- Weixun Wang 1
- Yingyao Wang 1
- Ziming Wang 1
- Donglai Wei 1
- Jian Yang 1
- Yifei Yang 1
- Haochao Ying 1
- Jiale Yuan 1
- Xiangyu Yue 1
- Hai Zhao 1
- Yingxiu Zhao 1
- Baihui Zheng 1
- Baolin Zheng 1
- Boren Zheng 1
- Hongqiong Zhong 1