Abstract
Reinforcement learning (RL) methods have been widely used for learning dialog policies. Sample efficiency, i.e., the efficiency of learning from limited dialog experience, is particularly important in RL-based dialog policy learning, because interacting with people is costly and low-quality dialog policies produce very poor user experience. In this paper, we develop LHUA (Learning with Hindsight, User modeling, and Adaptation) that, for the first time, enables dialog agents to adaptively learn with hindsight from both simulated and real users. Simulation and hindsight provide the dialog agent with more experience and more (positive) reinforcement respectively. Experimental results suggest that LHUA outperforms competitive baselines from the literature, including its no-simulation, no-adaptation, and no-hindsight counterparts.- Anthology ID:
- 2020.sigdial-1.40
- Volume:
- Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue
- Month:
- July
- Year:
- 2020
- Address:
- 1st virtual meeting
- Editors:
- Olivier Pietquin, Smaranda Muresan, Vivian Chen, Casey Kennington, David Vandyke, Nina Dethlefs, Koji Inoue, Erik Ekstedt, Stefan Ultes
- Venue:
- SIGDIAL
- SIG:
- SIGDIAL
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 329–338
- Language:
- URL:
- https://aclanthology.org/2020.sigdial-1.40
- DOI:
- 10.18653/v1/2020.sigdial-1.40
- Cite (ACL):
- Yan Cao, Keting Lu, Xiaoping Chen, and Shiqi Zhang. 2020. Adaptive Dialog Policy Learning with Hindsight and User Modeling. In Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 329–338, 1st virtual meeting. Association for Computational Linguistics.
- Cite (Informal):
- Adaptive Dialog Policy Learning with Hindsight and User Modeling (Cao et al., SIGDIAL 2020)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/2020.sigdial-1.40.pdf