陈宣齐


2026

Large Language Models (LLMs) have achieved remarkable success in text summarization, particularly through the integration of reinforcement learning. However, maintaining logical coherence and contextual consistency remains a pervasive challenge in long-form generation, often hindering the production of high-quality, unified summaries. To address these persistent issues, we propose TRAC, a framework that introduces a token-level reward function by integrating relative sentence gain, inter-sentence attention, and a Gaussian length penalty. By training a Process Reward Model (PRM) to provide fine-grained, step-wise supervision, TRAC ensures superior structural integrity and fluency during the generation process. Experimental results demonstrate that TRAC outperforms the sequence-level baseline by 11.05% in Fluency and 10.61% in Relevance. Furthermore, it achieves significant gains over competitive baselines such as FIGA and TLCR, underscoring its effectiveness and generalizability in high-quality NLP summarization.