Yi Xie
2026
Three Minds, One Legend: Jailbreak Large Reasoning Model with Adaptive Stacked Ciphers
Nguyen Viet Anh | Shiqian Zhao | Gia Dao | Runyi Hu | Yi Xie | Xiaobao Wu | Anh Tuan Luu
Findings of the Association for Computational Linguistics: ACL 2026
Nguyen Viet Anh | Shiqian Zhao | Gia Dao | Runyi Hu | Yi Xie | Xiaobao Wu | Anh Tuan Luu
Findings of the Association for Computational Linguistics: ACL 2026
Recently, Large Reasoning Models (LRMs) have demonstrated superior logical capabilities compared to traditional Large Language Models (LLMs), gaining significant attention. Despite their impressive performance, the potential for stronger reasoning abilities to introduce more severe security vulnerabilities, though pointed out by some previous works, remains largely underexplored. Existing jailbreak methods often struggle to balance effectiveness with robustness against adaptive safety mechanisms. In this work, we propose SEAL, a novel jailbreak attack that targets LRMs through an adaptive encryption pipeline designed to override their reasoning processes and evade potential adaptive alignment. Specifically, SEAL introduces a stacked encryption approach that combines multiple ciphers to overwhelm the model’s reasoning capabilities, effectively bypassing built-in safety mechanisms. To further prevent LRMs from developing countermeasures, we incorporate two dynamic strategies—random and adaptive—that adjust the cipher length, order, and combination. Extensive experiments on real-world reasoning models, including DeepSeek-R1, Claude Sonnet, and OpenAI GPT-o4-mini, validate the effectiveness of our approach. Notably, SEAL achieves an attack success rate of 85.6% on GPT o4-mini, outperforming state-of-the-art baselines by a significant margin of 17.2%. Warning: This paper contains examples of inappropriate, offensive, and harmful content
PERM: Psychology-grounded Empathetic Reward Modeling for Large Language Models
Chengbing Wang | Wuqiang Zheng | Yang Zhang | Fengbin Zhu | Junyi Cheng | Yi Xie | Wenjie Wang | Fuli Feng
Findings of the Association for Computational Linguistics: ACL 2026
Chengbing Wang | Wuqiang Zheng | Yang Zhang | Fengbin Zhu | Junyi Cheng | Yi Xie | Wenjie Wang | Fuli Feng
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) are increasingly deployed in human-centric applications, yet they often fail to provide substantive emotional support. While Reinforcement Learning (RL) has been utilized to enhance empathy of LLMs, existing reward models typically evaluate empathy from a single perspective, overlooking the inherently bidirectional interaction nature of empathy between the supporter and seeker as defined by Empathy Cycle theory. To address this limitation, we propose Psychology-grounded Empathetic Reward Modeling (PERM). PERM operationalizes empathy evaluation through a bidirectional decomposition: 1) Supporter perspective, assessing internal resonation and communicative expression; 2) Seeker perspective, evaluating emotional reception. Additionally, it incorporates a bystander perspective to monitor overall interaction quality. Extensive experiments on a widely-used emotional intelligence benchmark and an industrial daily conversation dataset demonstrate that PERM outperforms state-of-the-art baselines by over 10%. Furthermore, a blinded user study reveals a 70% preference for our approach, highlighting its efficacy in generating more empathetic responses.
2025
Reasoning under Uncertainty: Efficient LLM Inference via Unsupervised Confidence Dilution and Convergent Adaptive Sampling
Zhenning Shi | Yijia Zhu | Yi Xie | Junhan Shi | Guorui Xie | Haotian Zhang | Yong Jiang | Congcong Miao | Qing Li
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Zhenning Shi | Yijia Zhu | Yi Xie | Junhan Shi | Guorui Xie | Haotian Zhang | Yong Jiang | Congcong Miao | Qing Li
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) excel at complex reasoning tasks but often suffer from overconfidence and computational inefficiency due to fixed computation budgets and miscalibrated confidence estimates. We present a novel framework for computationally efficient, trustworthy reasoning under uncertainty, introducing two complementary techniques: Diversity-Aware Self-Signal Dilution (DASD) and Convergent Adaptive Weighted Sampling (CAWS). DASD operates in an unsupervised manner to dilute overconfident, semantically redundant reasoning paths, thereby producing better-calibrated internal confidence estimates. CAWS dynamically allocates computational resources at inference time by aggregating these signals and terminating computation once answer dominance and stability are achieved. Comprehensive experiments across three reasoning datasets demonstrate that our approach maintains accuracy levels while achieving over 70% reduction in inference cost, surpassing competitive baselines. Our framework provides a scalable, unsupervised solution for reliable and efficient LLM reasoning.