Xu Wan
2026
AdapThink: Adaptive Thinking Preferences for Reasoning Language Models
Wenyue Xu | Xu Wan | Wei Wang | Wenqi Huang | Wotao Yin | Shengjie Zhao | Mingyang Sun
Findings of the Association for Computational Linguistics: ACL 2026
Wenyue Xu | Xu Wan | Wei Wang | Wenqi Huang | Wotao Yin | Shengjie Zhao | Mingyang Sun
Findings of the Association for Computational Linguistics: ACL 2026
The slow thinking paradigm has been widely validated to enhance the reasoning capabilities of Large Language Models (LLMs), but it introduces notable reasoning inefficiencies: models often overthink simple tasks while prematurely shifting their reasoning paths when addressing complex problems. To address this, we propose AdapThink, a simple yet efficient framework for adaptive reasoning preference control. Unlike methods imposing uniform length constraints, AdapThink dynamically adjusts reflection preferences based on group-level distributional statistics of reasoning length and reflection intensity. We further introduce a dispersion-based diversity sampling mechanism that maximizes the geometric spread of reasoning patterns, accelerating learning through exposure to diverse problem-solving strategies. Across mathematical reasoning and code generation benchmarks, AdapThink reduces average response length by 17.1%-21.4% while improving performance by 6.12-6.59 points under 32K token budgets, demonstrating superior efficiency and robustness against reward hacking compared to strong baselines.