Xiaoyu Fan

2025

Jailbreak attacks have been observed to largely fail against recent reasoning models enhanced by Chain-of-Thought (CoT) reasoning. However, the underlying mechanism remains underexplored, and relying solely on reasoning capacity may raise security concerns. In this paper, we try to answer the question: Does CoT reasoning really reduce harmfulness from jailbreaking? Through rigorous theoretical analysis, we demonstrate that CoT reasoning has dual effects on jailbreaking harmfulness. Based on the theoretical insights, we propose a novel jailbreak method, FicDetail, whose practical performance validates our theoretical findings.

Co-authors

Yu Huang 1
Jijie Li 1
Chengda Lu 1
Rongwu Xu 1
Wei Xu 1

Venues

findings1

Fix data

Xiaoyu Fan

Fixing paper assignments

2025

Co-authors

Venues