Qi Dai


2026

Task-oriented proactive dialogue agents play a pivotal role in recruitment, particularly for steering conversations towards specific business outcomes, such as acquiring social-media contacts for private-channel conversion. Although supervised fine-tuning and reinforcement learning have proven effective for training such agents, their performance is heavily constrained by the scarcity of high-quality, goal-oriented domain-specific training data. To address this challenge, we propose SimRPD, a three-stage framework for training recruitment proactive dialogue agents. First, we develop a high-fidelity user simulator to synthesize large-scale conversational data through multi-turn online dialogue. Then we introduce a multi-dimensional evaluation framework based on Chain-of-Intention (CoI) to comprehensively assess the simulator and effectively select high-quality data, incorporating both global-level and instance-level metrics. Finally, we train the recruitment proactive dialogue agent on the selected dataset. Experiments in a real-world recruitment scenario demonstrate that SimRPD outperforms existing simulator-based data selection strategies, highlighting its practical value for industrial deployment and its potential applicability to other business-oriented dialogue scenarios.

2022

In order to better understand the rationale behind model behavior, recent works have exploited providing interpretation to support the inference prediction. However, existing methods tend to provide human-unfriendly interpretation, and are prone to sub-optimal performance due to one-side promotion, i.e. either inference promotion with interpretation or vice versa. In this paper, we propose a multi-level Mutual Promotion mechanism for self-evolved Inference and sentence-level Interpretation (MPII). Specifically, from the model-level, we propose a Step-wise Integration Mechanism to jointly perform and deeply integrate inference and interpretation in an autoregressive manner. From the optimization-level, we propose an Adversarial Fidelity Regularization to improve the fidelity between inference and interpretation with the Adversarial Mutual Information training strategy. Extensive experiments on NLI and CQA tasks reveal that the proposed MPII approach can significantly outperform baseline models for both the inference performance and the interpretation quality.