Zhufeng Suo
2026
Rethinking Reasoning: A Survey on Reasoning-based Backdoors in LLMs
Man Hu | Xinyi Wu | Zhufeng Suo | Jinbo Feng | Linghui Meng | Yanhao Jia | Anh Tuan Luu | Shuai Zhao
Findings of the Association for Computational Linguistics: ACL 2026
Man Hu | Xinyi Wu | Zhufeng Suo | Jinbo Feng | Linghui Meng | Yanhao Jia | Anh Tuan Luu | Shuai Zhao
Findings of the Association for Computational Linguistics: ACL 2026
With the rise of advanced reasoning capabilities, large language models (LLMs) are receiving increasing attention. While reasoning enhances LLMs’ performance on downstream tasks, it also introduces new threat vectors, as adversaries can leverage these capabilities to conduct backdoor attacks. Prior surveys provide broad overviews of backdoor attacks and reasoning security; however, a systematic survey focused on backdoor attacks and defenses against LLM reasoning is still absent. In this paper, we take the first step toward providing a comprehensive review of reasoning-based backdoor attacks in LLMs by analyzing their underlying mechanisms, methodological frameworks, and unresolved challenges. Specifically, we introduce a new taxonomy that offers a unified perspective for summarizing existing approaches, categorizing reasoning-based backdoor attacks into associative, passive, and active. We also summarize defenses against such attacks and discuss current challenges alongside future research directions.