Jinpeng Wang

Other people with similar names: Jinpeng Wang

Unverified author pages with similar names: Jinpeng Wang

2026

Large Language Models (LLMs) have demonstrated remarkable capabilities in open-domain dialogues. However, their performance in service dialogues remains suboptimal, as these require agents to guide users toward specific business objectives while dynamically tracking states and adapting strategies. This gap stems from the scarcity of high-quality training data and the difficulty in simulating authentic, goal-oriented user behaviors. We propose SEAD (Self-Evolving Agent for Service Dialogue), a framework that enables agents to learn effective strategies without large-scale human annotations. SEAD decouples user modeling into two components: a Profile Controller that generates diverse user states to manage training curriculum, and a User Simulator that focuses on realistic role-playing. This design ensures the environment provides adaptive training scenarios rather than acting as an unfair adversary.

pdf bib abs

Beyond Fully Random Masking: Attention-Guided Denoising and Optimization for Diffusion Language Models
Jia Deng | Junyi Li | Xin Zhao | Jinpeng Wang | Hongyu Lu | Ji-Rong Wen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Diffusion large language models (dLLMs) offer an efficient alternative to autoregressive models through parallel decoding, yet existing post-training methods largely rely on random masking strategies that overlook intrinsic token dependencies. In this work, we present an empirical analysis of attention in dLLMs and show that tokens attending more strongly to revealed context exhibit greater generation stability and play a critical role in reasoning. Motivated by these findings, we propose AGDO, an attention-guided denoising and optimization framework that aligns both training and optimization with attention-derived dependencies. AGDO determines the denoising order based on attention structure and emphasizes attention-critical tokens during supervised fine-tuning and reinforcement learning. Experiments on mathematical and coding benchmarks demonstrate that AGDO consistently improves reasoning performance, outperforming state-of-the-art post-training methods for dLLMs.

Co-authors

Venues

ACL1
Findings1

Fix author