One Planner To Guide Them All ! Learning Adaptive Conversational Planners for Goal-oriented Dialogues

Huy Quang Dao; Lizi Liao

One Planner To Guide Them All ! Learning Adaptive Conversational Planners for Goal-oriented Dialogues

Abstract

Goal-oriented dialogues, such as recommendation and negotiation, often require balancing multiple, conflicting objectives. Existing methods typically involve training separate models for specific combinations of objectives, leading to computational and scalability issues. In this work, we aim to develop a new dialogue policy method that can adapt to varying objective preferences at inference time without retraining. This raises several challenges in terms of both (1) optimization strategy and (2) knowledge utilization. To address these, we propose a novel learning framework, Preference Adaptive Dialogue Policy Planner (PADPP), for multi-objective goal-oriented dialogues. Specifically, to tackle the former, we introduce a novel policy optimization scheme, which leverages information gained from training the model on previously updated objective weights, accelerating the learning capability on new weight settings. To address the latter, we utilize Generalized Policy Improvement (GPI) to ensure the effectiveness of leveraged knowledge. Experimental results demonstrate that PADPP achieves superior adaptability and performance compared to state-of-the-art approaches, offering a scalable and flexible solution for multi-objective, goal-oriented dialogues. Code and data are available at the anonymous link.

Anthology ID:: 2025.emnlp-main.1123
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 22103–22127
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1123/
DOI:
Bibkey:
Cite (ACL):: Huy Quang Dao and Lizi Liao. 2025. One Planner To Guide Them All ! Learning Adaptive Conversational Planners for Goal-oriented Dialogues. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 22103–22127, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: One Planner To Guide Them All ! Learning Adaptive Conversational Planners for Goal-oriented Dialogues (Dao & Liao, EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1123.pdf
Checklist:: 2025.emnlp-main.1123.checklist.pdf

PDF Cite Search Checklist Fix data