Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization

Xingjian Diao; Zheyuan Liu; Chunhui Zhang; Weiyi Wu; Keyi Kong; Lin Shi; Kaize Ding; Soroush Vosoughi; Jiang Gui

Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization

Xingjian Diao, Zheyuan Liu, Chunhui Zhang, Weiyi Wu, Keyi Kong, Lin Shi, Kaize Ding, Soroush Vosoughi, Jiang Gui

Abstract

Large Vision-Language Models (LVLMs) have exhibited strong reasoning capabilities through chain-of-thought mechanisms that generate step-by-step rationales. However, such slow-thinking approaches often lead to overthinking, where models produce excessively verbose responses even for simple queries, resulting in test-time inefficiency and even degraded accuracy. Prior work has attempted to mitigate this issue via adaptive reasoning strategies, but these methods largely overlook a fundamental bottleneck: visual perception failures. We argue that stable reasoning critically depends on low-level visual grounding, and that reasoning errors often originate from imperfect perception rather than insufficient deliberation. To address this limitation, we propose Gated Perception-Reasoning Optimization (GPRO), a meta-reasoning controller that dynamically routes computation among three decision paths at each generation step: a lightweight fast path, a slow perception path for re-examining visual inputs, and a slow reasoning path for internal self-reflection. To learn this distinction, we derive large-scale failure attribution supervision from approximately 790k samples, using teacher models to distinguish perceptual hallucinations from reasoning errors. We then train the controller with multi-objective reinforcement learning to optimize the trade-off between task accuracy and computational cost under uncertainty. Experiments on five benchmarks demonstrate that GPRO substantially improves both accuracy and efficiency, outperforming recent slow-thinking methods while generating significantly shorter responses.

Anthology ID:: 2026.findings-acl.215
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4393–4410
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.215/
DOI:
Bibkey:
Cite (ACL):: Xingjian Diao, Zheyuan Liu, Chunhui Zhang, Weiyi Wu, Keyi Kong, Lin Shi, Kaize Ding, Soroush Vosoughi, and Jiang Gui. 2026. Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization. In Findings of the Association for Computational Linguistics: ACL 2026, pages 4393–4410, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization (Diao et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.215.pdf
Checklist:: 2026.findings-acl.215.checklist.pdf

PDF Cite Search Checklist Fix data