Yan Cao

2020

pdf abs
Adaptive Dialog Policy Learning with Hindsight and User Modeling
Yan Cao | Keting Lu | Xiaoping Chen | Shiqi Zhang
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Reinforcement learning (RL) methods have been widely used for learning dialog policies. Sample efficiency, i.e., the efficiency of learning from limited dialog experience, is particularly important in RL-based dialog policy learning, because interacting with people is costly and low-quality dialog policies produce very poor user experience. In this paper, we develop LHUA (Learning with Hindsight, User modeling, and Adaptation) that, for the first time, enables dialog agents to adaptively learn with hindsight from both simulated and real users. Simulation and hindsight provide the dialog agent with more experience and more (positive) reinforcement respectively. Experimental results suggest that LHUA outperforms competitive baselines from the literature, including its no-simulation, no-adaptation, and no-hindsight counterparts.

Yan Cao

2020

2018

Co-authors

Venues