AttnPO: Attention-Guided Process Supervision for Efficient Reasoning

Shuaiyi Nie; Dingsiyu; Wenyuan Zhang; Linhao Yu; Tianmeng Yang; Yao Chen; Weichong Yin; Yu Sun; Hua Wu (吴华); Tingwen Liu

AttnPO: Attention-Guided Process Supervision for Efficient Reasoning

Shuaiyi Nie, Dingsiyu, Wenyuan Zhang, Linhao Yu, Tianmeng Yang, Yao Chen, Weichong Yin, Yu Sun, Hua Wu, Tingwen Liu

Abstract

Large reasoning models trained with reinforcement learning and verifiable rewards (RLVR) achieve strong performance on complex reasoning tasks, yet often overthink, generating redundant reasoning without performance gains. Existing trajectory-level length penalties often fail to effectively shorten reasoning length and degrade accuracy, as they uniformly treat all reasoning steps and lack fine-grained signals to distinguish redundancy from necessity. Meanwhile, process-supervised methods are typically resource-intensive and suffer from inaccurate credit assignment. To address these issues, we propose ATTNPO, a low-overhead process-supervised RL framework that leverages the model’s intrinsic attention signals for step-level credit assignment. We first identify a set of special attention heads that naturally focus on essential steps while suppressing redundant ones. By leveraging the attention scores of these heads, We then employ two sub-strategies to mitigate overthinking by discouraging redundant steps while preserving accuracy by reducing penalties on essential steps. Experimental results show that ATTNPO substantially reduces reasoning length while significantly improving performance across 9 benchmarks.

Anthology ID:: 2026.acl-long.1845
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 39728–39748
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1845/
DOI:
Bibkey:
Cite (ACL):: Shuaiyi Nie, Dingsiyu, Wenyuan Zhang, Linhao Yu, Tianmeng Yang, Yao Chen, Weichong Yin, Yu Sun, Hua Wu, and Tingwen Liu. 2026. AttnPO: Attention-Guided Process Supervision for Efficient Reasoning. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 39728–39748, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: AttnPO: Attention-Guided Process Supervision for Efficient Reasoning (Nie et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1845.pdf
Checklist:: 2026.acl-long.1845.checklist.pdf

PDF Cite Search Checklist Fix data