Jiajian Guo

2026

ProactiveEval: A Unified Evaluation Framework for Proactive Dialogue Agents
Tianjian Liu | Fanqi Wan | Jiajian Guo | Xiaojun Quan
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Proactive dialogue has emerged as a critical and challenging research problem in advancing large language models (LLMs). Existing works predominantly focus on domain-specific or task-oriented scenarios, which leads to fragmented evaluations and limits the comprehensive exploration of models’ proactive dialogue abilities. In this work, we propose ProactiveEval, a unified framework for evaluating proactive dialogue capabilities of LLMs. This framework decomposes proactive dialogue into target planning and dialogue guidance, establishing evaluation metrics across various domains. Moreover, it also enables the automatic generation of diverse and challenging evaluation data. Based on the proposed framework, we develop 328 evaluation environments spanning 6 distinct domains. Through experiments with 22 different types of LLMs, we show that DeepSeek-R1 and Claude-3.7-Sonnet exhibit exceptional performance on target planning and dialogue guidance tasks, respectively. Finally, we investigate how reasoning capabilities influence proactive behaviors and discuss their implications for future model development. Our code and data are available at the https://github.com/liutj9/ProactiveEval.

2025

pdf bib abs

Aligning small language models with human preferences is challenging, as weak policies struggle to generate informative on-policy samples and suffer from unstable gradients when trained on off-policy signals from stronger models. In this work, we propose ReAlign, a training framework that combines the stability of on-policy learning with the guidance of reviser-assisted supervision. In the ReAlign, we first train a lightweight reviser to improve policy-generated responses using preference-based supervision, conditioned on both the prompt and the initial output. And then, the policy is optimized using a combination of standard on-policy preference pairs and reviser-enhanced pairs constructed as a structured revision task, where the latter provide richer, more learnable feedback. Experimental results on AlpacaEval-2 and Arena-Hard demonstrate that ReAlign significantly boosts alignment performance for weak policies, outperforming strong preference optimization baselines.

Co-authors

Qifan Wang 1

Venues

ACL1
Findings1

Fix author