Xinfeng Liao


2026

Large Language Models (LLMs) have achieved remarkable success in text summarization, particularly through the integration of reinforcement learning. However, maintaining logical coherence and contextual consistency remains a pervasive challenge in long-form generation, often hindering the production of high-quality, unified summaries. To address these persistent issues, we propose TRAC, a framework that introduces a token-level reward function by integrating relative sentence gain, inter-sentence attention, and a Gaussian length penalty. By training a Process Reward Model (PRM) to provide fine-grained, step-wise supervision, TRAC ensures superior structural integrity and fluency during the generation process. Experimental results demonstrate that TRAC outperforms the sequence-level baseline by 11.05% in Fluency and 10.61% in Relevance. Furthermore, it achieves significant gains over competitive baselines such as FIGA and TLCR, underscoring its effectiveness and generalizability in high-quality NLP summarization.

2024

Sentiment classification (SC) often suffers from low-resource challenges such as domain-specific contexts, imbalanced label distributions, and few-shot scenarios. The potential of the diffusion language model (LM) for textual data augmentation (DA) remains unexplored, moreover, textual DA methods struggle to balance the diversity and consistency of new samples. Most DA methods either perform logical modifications or rephrase less important tokens in the original sequence with the language model. In the context of SC, strong emotional tokens could act critically on the sentiment of the whole sequence. Therefore, contrary to rephrasing less important context, we propose DiffusionCLS to leverage a diffusion LM to capture in-domain knowledge and generate pseudo samples by reconstructing strong label-related tokens. This approach ensures a balance between consistency and diversity, avoiding the introduction of noise and augmenting crucial features of datasets. DiffusionCLS also comprises a Noise-Resistant Training objective to help the model generalize. Experiments demonstrate the effectiveness of our method in various low-resource scenarios including domain-specific and domain-general problems. Ablation studies confirm the effectiveness of our framework’s modules, and visualization studies highlight optimal deployment conditions, reinforcing our conclusions.