Hao Wu
Other people with similar names: Hao Wu, Hao Wu, Hao Wu, Hao Wu
Unverified author pages with similar names: Hao Wu
2026
When Efficiency Meets Safety: A Benchmark Security Analysis of KV Cache Compression in Large Language Models
Xiaoxiao Ma | Kuofeng Gao | Zeyi Lu | Wenxi Jiang | Hao Fang | Hao Wu | Bin Chen | Shu-Tao Xia
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xiaoxiao Ma | Kuofeng Gao | Zeyi Lu | Wenxi Jiang | Hao Fang | Hao Wu | Bin Chen | Shu-Tao Xia
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Key-Value (KV) caching is widely used in large language models (LLMs) to enable long-context inference efficiently, yet its security implications remain underexplored. We present the first systematic study of how KV cache compression interacts with jailbreak attacks, evaluating four model families under diverse jailbreak attacks. We identify a double-edged effect: (i) on one hand, compression can induce **Accidental Robustness**, where optimization-based and encoding-based attacks fail due to Malicious Semantic Eviction, where attacks’ own attention redirection reduces the malicious query’s cache importance, and Gradient Mismatch where discrete compression operations break jailbreak optimization. (ii) On the other hand, **Vulnerability Paradox** arises under merging-based compression for human-designed Attacks, where aggressive merging in shallow layers triggers functional head collapse, amplifying attack success rates. To address this, we propose **Safe-CAM**, a history-aware, per-head feedback merging strategy that prevents safety degradation while maintaining efficiency. Experiments show Safe-CAM fully restores safety (0% ASR) and improves benign task performance with minimal overhead. Our study highlights that KV cache compression is not only an efficiency mechanism but also a safety-critical design factor in LLM deployment.
C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts
Chenxi Qing | Junxi Wu | Zheng Liu | Yixiang Qiu | Hongyao Yu | Bin Chen | Hao Wu | Shu-Tao Xia
Findings of the Association for Computational Linguistics: ACL 2026
Chenxi Qing | Junxi Wu | Zheng Liu | Yixiang Qiu | Hongyao Yu | Bin Chen | Hao Wu | Shu-Tao Xia
Findings of the Association for Computational Linguistics: ACL 2026
Recently, large language models (LLMs) are capable of generating highly fluent textual content. While they offer significant convenience to humans, they also introduce various risks, like phishing and academic dishonesty. Numerous research efforts have been dedicated to developing algorithms for detecting AI-generated text and constructing relevant datasets. However, in the domain of Chinese corpora, challenges remain, including limited model diversity and data homogeneity. To address these issues, we propose C-ReD: a comprehensive Chinese Real-prompt AI-generated text Detection benchmark. Experiments demonstrate that C-ReD not only enables reliable in-domain detection but also supports strong generalization to unseen LLMs and external Chinese datasets—addressing critical gaps in model diversity, domain coverage, and prompt realism that have limited prior Chinese detection benchmarks. We release our resources at https://github.com/HeraldofLight/C-ReD.
2025
MoSEs: Uncertainty-Aware AI-Generated Text Detection via Mixture of Stylistics Experts with Conditional Thresholds
Junxi Wu | Jinpeng Wang | Zheng Liu | Bin Chen | Dongjian Hu | Hao Wu | Shu-Tao Xia
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Junxi Wu | Jinpeng Wang | Zheng Liu | Bin Chen | Dongjian Hu | Hao Wu | Shu-Tao Xia
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
The rapid advancement of large language models has intensified public concerns about the potential misuse. Therefore, it is important to build trustworthy AI-generated text detection systems. Existing methods neglect stylistic modeling and mostly rely on static thresholds, which greatly limits the detection performance. In this paper, we propose the Mixture of Stylistic Experts (MoSEs) framework that enables stylistics-aware uncertainty quantification through conditional threshold estimation. MoSEs contain three core components, namely, the Stylistics Reference Repository (SRR), the Stylistics-Aware Router (SAR), and the Conditional Threshold Estimator (CTE). For input text, SRR can activate the appropriate reference data in SRR and provide them to CTE. Subsequently, CTE jointly models the linguistic statistical properties and semantic features to dynamically determine the optimal threshold. With a discrimination score, MoSEs yields prediction labels with the corresponding confidence level. Our framework achieves an average improvement 11.34% in detection performance compared to baselines. More inspiringly, MoSEs shows a more evident improvement 39.15% in the low-resource case. Our code is available at https://github.com/creator-xi/MoSEs.