Shuaiyi Nie
2026
AttnPO: Attention-Guided Process Supervision for Efficient Reasoning
Shuaiyi Nie | Dingsiyu | Wenyuan Zhang | Linhao Yu | Tianmeng Yang | Yao Chen | Weichong Yin | Yu Sun | Hua Wu | Tingwen Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shuaiyi Nie | Dingsiyu | Wenyuan Zhang | Linhao Yu | Tianmeng Yang | Yao Chen | Weichong Yin | Yu Sun | Hua Wu | Tingwen Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large reasoning models trained with reinforcement learning and verifiable rewards (RLVR) achieve strong performance on complex reasoning tasks, yet often overthink, generating redundant reasoning without performance gains. Existing trajectory-level length penalties often fail to effectively shorten reasoning length and degrade accuracy, as they uniformly treat all reasoning steps and lack fine-grained signals to distinguish redundancy from necessity. Meanwhile, process-supervised methods are typically resource-intensive and suffer from inaccurate credit assignment. To address these issues, we propose ATTNPO, a low-overhead process-supervised RL framework that leverages the model’s intrinsic attention signals for step-level credit assignment. We first identify a set of special attention heads that naturally focus on essential steps while suppressing redundant ones. By leveraging the attention scores of these heads, We then employ two sub-strategies to mitigate overthinking by discouraging redundant steps while preserving accuracy by reducing penalties on essential steps. Experimental results show that ATTNPO substantially reduces reasoning length while significantly improving performance across 9 benchmarks.
2025
Revealing and Mitigating the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing
Wenyuan Zhang | Shuaiyi Nie | Jiawei Sheng | Zefeng Zhang | Xinghua Zhang | Yongquan He | Tingwen Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Wenyuan Zhang | Shuaiyi Nie | Jiawei Sheng | Zefeng Zhang | Xinghua Zhang | Yongquan He | Tingwen Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language model (LLM) role-playing has gained widespread attention. Authentic character knowledge is crucial for constructing realistic LLM role-playing agents. However, existing works usually overlook the exploration of LLMs’ ability to detect characters’ known knowledge errors (KKE) and unknown knowledge errors (UKE) while playing roles, which would lead to low-quality automatic construction of character trainable corpus. In this paper, we propose RoleKE-Bench to evaluate LLMs’ ability to detect errors in KKE and UKE. The results indicate that even the latest LLMs struggle to detect these two types of errors effectively, especially when it comes to familiar knowledge. We experimented with various reasoning strategies and propose an agent-based reasoning method, Self-Recollection and Self-Doubt (S2RD), to explore further the potential for improving error detection capabilities.