Abstract
In this paper, we explore state-of-the-art deep reinforcement learning methods for dialog policy training such as prioritized experience replay, double deep Q-Networks, dueling network architectures and distributional learning. Our main findings show that each individual method improves the rewards and the task success rate but combining these methods in a Rainbow agent, which performs best across tasks and environments, is a non-trivial task. We, therefore, provide insights about the influence of each method on the combination and how to combine them to form a Rainbow agent.- Anthology ID:
- W19-5908
- Volume:
- Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue
- Month:
- September
- Year:
- 2019
- Address:
- Stockholm, Sweden
- Venue:
- SIGDIAL
- SIG:
- SIGDIAL
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 62–67
- Language:
- URL:
- https://aclanthology.org/W19-5908
- DOI:
- 10.18653/v1/W19-5908
- Cite (ACL):
- Dirk Väth and Ngoc Thang Vu. 2019. To Combine or Not To Combine? A Rainbow Deep Reinforcement Learning Agent for Dialog Policies. In Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, pages 62–67, Stockholm, Sweden. Association for Computational Linguistics.
- Cite (Informal):
- To Combine or Not To Combine? A Rainbow Deep Reinforcement Learning Agent for Dialog Policies (Väth & Vu, SIGDIAL 2019)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/W19-5908.pdf