Xiangzheng Zhang
2026
DMN: A Compositional Framework for Jailbreaking Multimodal LLMs with Multi-Image Inputs
Wenzhuo Xu | Zhipeng Wei | Zonghao Ying | Deyue Zhang | Dongdong Yang | Xiangzheng Zhang | Quanchen Zou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Wenzhuo Xu | Zhipeng Wei | Zonghao Ying | Deyue Zhang | Dongdong Yang | Xiangzheng Zhang | Quanchen Zou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multimodal Large Language Models (MLLMs) are vulnerable to jailbreak attacks, which can elicit harmful responses from MLLMs. Many MLLMs support multi-image inputs, inadvertently introducing new vulnerabilities due to less efforts on multi-image safety alignment. Previous MLLM jailbreak methods only uses a single image, which restricts the attack space: they cannot distribute harmful requests across multiple images, carry abundant information, or exploit additional visual reasoning tasks to distract MLLMs. To address these limitations, in this paper, we propose a compositional jailbreak framework, DMN, which leverages Distributed instruction, Multimodal evidence and a Number chain task to fully enhance the jailbreak performance. Extensive experiments show that DMN is highly effective for MLLM jailbreaking, e.g. achieving attack success rates of over 90% on GPT-4o, Gemini-2.5-pro and Claude Sonnet 4, surpassing other baselines by a large margin. This compositional, multi-image jailbreak strategy reveals fundamental weaknesses in their safety mechanisms.
TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment
Zhewen Tan | Wenhan Yu | Jianfeng Si | Tongxin Liu | Kaiqi Guan | Huiyan Jin | Jiawen Tao | Xiaokun Yuan | Xiangzheng Zhang | Duohe Ma | Tong Yang | Lin Sun
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhewen Tan | Wenhan Yu | Jianfeng Si | Tongxin Liu | Kaiqi Guan | Huiyan Jin | Jiawen Tao | Xiaokun Yuan | Xiangzheng Zhang | Duohe Ma | Tong Yang | Lin Sun
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In recent years, safety risks associated with large language models have become increasingly prominent, highlighting the urgent need to mitigate the generation of toxic and harmful content. The mainstream paradigm for LLM safety alignment typically adopts a collaborative framework involving three roles: an attacker for adversarial prompt generation, a defender for safety defense, and an evaluator for response assessment. In this paper, we propose a closed-loop reinforcement learning framework called TriPlay-RL that enables iterative and co-improving collaboration among three roles with near-zero manual annotation. Experimental results show that the attacker preserves high output diversity while achieving a 20%–50% improvement in adversarial effectiveness. The defender attains 10%–30% gains in safety performance without degrading general reasoning capability, and the evaluator continuously refines its fine-grained judgment ability through iterations, accurately distinguishing unsafe responses, simple refusals, and useful guidance. Overall, our framework establishes an efficient and scalable paradigm for LLM safety alignment, enabling continuous co-evolution within a unified learning loop. The code is available at https://github.com/Qihoo360/TriPlay-RL.
When Good OCR Is Not Enough: Benchmarking OCR Robustness for Retrieval-Augmented Generation
Lin Sun | Wangdexian | Jingang Huang | Linglin Zhang | Change Jia | Zhengwei Cheng | Xiangzheng Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Lin Sun | Wangdexian | Jingang Huang | Linglin Zhang | Change Jia | Zhengwei Cheng | Xiangzheng Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Industrial Retrieval-Augmented Generation (RAG) systems depend on optical character recognition (OCR) to transform visual documents into text. Existing OCR benchmarks rely on character-level metrics, which inadequately measure downstream RAG effectiveness under real-world conditions. We introduce an OCR benchmark for industrial RAG systems covering 11 challenging document types, including extreme layouts, high-resolution pages, complex or watermarked backgrounds, historical documents with non-standard reading orders, visually decorated text, and documents containing tables and mathematical formulas. Evaluating recent SOTA OCR models under a controlled OCR-first RAG pipeline shows clear performance degradation on realistic industrial documents despite strong conventional benchmark scores. We find that high OCR accuracy does not necessarily translate into strong downstream RAG performance: structural and semantic errors can cause substantial retrieval failures even when WER/CER remains low. Further analysis shows that this mismatch is category-dependent, arises through both retrieval-side and downstream generation-side failures, and remains stable across representative OCR-first pipeline choices. The benchmark is publicly available at https://github.com/Qihoo360/InduOCRBench.
Thinking with Reasoning Skills: Fewer Tokens, More Accuracy
Guangxiang Zhao | Qilong Shi | Xusen Xiao | Xiangzheng Zhang | Tong Yang | Lin Sun
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Guangxiang Zhao | Qilong Shi | Xusen Xiao | Xiangzheng Zhang | Tong Yang | Lin Sun
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Reasoning LLMs often spend substantial tokens on long intermediate reasoning traces (e.g., chain-of-thought) when solving new problems. We propose to summarize and store reusable reasoning skills distilled from extensive deliberation and trial-and-error exploration, and to retrieve these skills at inference time to guide future reasoning. Unlike the prevailing reasoning from scratch paradigm, our approach first recalls relevant skills for each query, helping the model avoid redundant detours and focus on effective solution paths. We evaluate our method on coding and mathematical reasoning tasks, and find that it significantly reduces reasoning tokens while improving overall performance. The resulting lower per-request cost indicates strong practical and economic potential for real-world deployment.
2025
Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models
Zonghao Ying | Deyue Zhang | Zonglei Jing | Yisong Xiao | Quanchen Zou | Aishan Liu | Siyuan Liang | Xiangzheng Zhang | Xianglong Liu | Dacheng Tao
Findings of the Association for Computational Linguistics: EMNLP 2025
Zonghao Ying | Deyue Zhang | Zonglei Jing | Yisong Xiao | Quanchen Zou | Aishan Liu | Siyuan Liang | Xiangzheng Zhang | Xianglong Liu | Dacheng Tao
Findings of the Association for Computational Linguistics: EMNLP 2025
Multi-turn jailbreak attacks simulate real-world human interactions by engaging large language models (LLMs) in iterative dialogues, exposing critical safety vulnerabilities. However, existing methods often struggle to balance semantic coherence with attack effectiveness, resulting in either benign semantic drift or ineffective detection evasion. To address this challenge, we propose Reasoning-Augmented Conversation (RACE), a novel multi-turn jailbreak framework that reformulates harmful queries into benign reasoning tasks and leverages LLMs’ strong reasoning capabilities to compromise safety alignment. Specifically, we introduce an attack state machine framework to systematically model problem translation and iterative reasoning, ensuring coherent query generation across multiple turns. Building on this framework, we design gain-guided exploration, self-play, and rejection feedback modules to preserve attack semantics, enhance effectiveness, and sustain reasoning-driven attack progression. Extensive experiments on multiple LLMs demonstrate that RACE achieves state-of-the-art attack effectiveness in complex conversational scenarios, with attack success rates (ASRs) increasing by up to 96%. Notably, our approach achieves average ASR of 83.3% against leading commercial models, including Gemini 2.0 Flashing Thinking and OpenAI o1, underscoring its potency.
Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision
Dawei Zhu | Xiyu Wei | Guangxiang Zhao | Wenhao Wu | Haosheng Zou | Junfeng Ran | XWang | Lin Sun | Xiangzheng Zhang | Sujian Li
Findings of the Association for Computational Linguistics: EMNLP 2025
Dawei Zhu | Xiyu Wei | Guangxiang Zhao | Wenhao Wu | Haosheng Zou | Junfeng Ran | XWang | Lin Sun | Xiangzheng Zhang | Sujian Li
Findings of the Association for Computational Linguistics: EMNLP 2025
Recent advances in Large Language Models (LLMs) have highlighted the challenge of handling long-context tasks, where models need to reason over extensive input contexts to aggregate target information. While Chain-of-Thought (CoT) prompting has shown promise for multi-step reasoning, its effectiveness for long-context scenarios remains underexplored. Through systematic investigation across diverse tasks, we demonstrate that CoT’s benefits generalize across most long-context scenarios and amplify with increasing context length. Motivated by this, we propose a process-supervised framework that teaches models to generate high-quality reasoning paths for enhanced long-context performance. Our framework incorporates a self-sampling mechanism to bootstrap reasoning paths and a novel quality assessment protocol specifically designed for long-context scenarios. This protocol evaluates both answer correctness and process reliability, with the latter decomposed into source faithfulness and intrinsic consistency components for efficient and accurate assessment. Experimental results on various long-context benchmarks demonstrate the effectiveness of our approach, achieving significant improvements over outcome supervision baselines on both in-domain tasks (+13.6/+3.8 points for LLaMA/Qwen on MuSiQue) and cross-domain generalization (+9.3/+8.1 points on average across diverse QA tasks). Our code, data and trained models will be released upon acceptance.
Large Language Models Badly Generalize across Option Length, Problem Types, and Irrelevant Noun Replacements
Guangxiang Zhao | Saier Hu | Xiaoqi Jian | Wu Jinzhu | Yuhan Wu | Lin Sun | Xiangzheng Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Guangxiang Zhao | Saier Hu | Xiaoqi Jian | Wu Jinzhu | Yuhan Wu | Lin Sun | Xiangzheng Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
In this paper, we propose a “Generalization Stress Test” to assess Large Language Models’ (LLMs) generalization ability under slight and controlled perturbations, including option length, problem types, and irrelevant noun replacements. We achieve novel and significant findings that, despite high benchmark scores, LLMs exhibit severe accuracy drops and unexpected biases (e.g., preference for longer distractors) when faced with these minor but content-preserving modifications. For example, Qwen 2.5 1.5B’s MMLU score rises from 60 to 89 and drops from 89 to 36 when option lengths are changed without altering the question. Even GPT4o experiences a 25-point accuracy loss when problem types are changed, with a 6-point drop across all three modification categories. These analyses suggest that LLMs rely heavily on superficial cues rather than forming robust, abstract representations that generalize across formats, lexical variations, and shifts in irrelevant content.
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond
Liang Wen | Yunke Cai | Fenrui Xiao | Xin He | Qi An | Zhenyu Duan | Yimin Du | Junchen Liu | Lifu Tang | Xiaowei Lv | Haosheng Zou | Yongchao Deng | Shousheng Jia | Xiangzheng Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Liang Wen | Yunke Cai | Fenrui Xiao | Xin He | Qi An | Zhenyu Duan | Yimin Du | Junchen Liu | Lifu Tang | Xiaowei Lv | Haosheng Zou | Yongchao Deng | Shousheng Jia | Xiangzheng Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
This paper introduces Light-R1, an opensource suite for training long reasoning modelsusing reproducible and cost-effective methodology. Given the proprietary nature of data usedin the DeepSeek-R1 series, we develop an alternative approach leveraging exclusively publicdata and models. Our curriculum training progressively increases data difficulty, combinedwith multi-staged post-training. Our LightR1-32B model, trained from Qwen2.5-32BInstruct, outperforms DeepSeek-R1-DistillQwen-32B in math reasoning. Experimental results show that this curriculum approachbecomes more effective when distinct, diverse datasets are available for different training stages: fine-tuning DeepSeek-R1-Distilledmodels (pre-tuned by DeepSeek team on proprietary data) with 3,000 challenging examplesfrom our curriculum dataset yielded state-ofthe-art 7B and 14B models, while the 32Bmodel, Light-R1-32B-DS performed comparably to QwQ-32B and DeepSeek-R1. Furthermore, we extend our work by applying GRPOon long reasoning models. Our final Light-R1-14B-DS achieves SOTA performance among14B models in math, with AIME24 & 25 scoresof 74.0 and 60.2 respectively, surpassing many32B models and DeepSeek-R1-Distill-Llama70B. Despite math-focused training, Light-R1-14B-DS demonstrates strong cross-domain generalization. Light-R1 represents a significantadvancement in making sophisticated reasoning models more accessible and implementablein real-world applications. Our models, training data and code have been made available.
Search
Fix author
Co-authors
- Lin Sun 5
- Guangxiang Zhao 3
- Tong Yang 2
- Zonghao Ying 2
- Deyue Zhang 2
- Haosheng Zou 2
- Quanchen Zou 2
- Qi An 1
- Yunke Cai 1
- Zhengwei Cheng 1
- Yongchao Deng 1
- Yimin Du 1
- Zhenyu Duan 1
- Kaiqi Guan 1
- Xin He 1
- Saier Hu 1
- Jingang Huang 1
- Change Jia 1
- Shousheng Jia 1
- Xiaoqi Jian 1
- Huiyan Jin 1
- Zonglei Jing 1
- Wu Jinzhu 1
- Sujian Li (李素建) 1
- Siyuan Liang 1
- Aishan Liu 1
- Junchen Liu 1
- Tongxin Liu 1
- Xianglong Liu 1
- Xiaowei Lv 1
- Duohe Ma 1
- Junfeng Ran 1
- Qilong Shi 1
- Jianfeng Si 1
- Zhewen Tan 1
- Lifu Tang 1
- Dacheng Tao 1
- Jiawen Tao 1
- Wangdexian 1
- Xiyu Wei 1
- Zhipeng Wei 1
- Liang Wen 1
- Wenhao Wu 1
- Yuhan Wu 1
- XWang 1
- Fenrui Xiao 1
- Xusen Xiao 1
- Yisong Xiao 1
- Wenzhuo Xu 1
- Dongdong Yang 1
- Wenhan Yu 1
- Xiaokun Yuan 1
- Linglin Zhang 1
- Dawei Zhu 1