Abstract
Machine translation is a natural candidate problem for reinforcement learning from human feedback: users provide quick, dirty ratings on candidate translations to guide a system to improve. Yet, current neural machine translation training focuses on expensive human-generated reference translations. We describe a reinforcement learning algorithm that improves neural machine translation systems from simulated human feedback. Our algorithm combines the advantage actor-critic algorithm (Mnih et al., 2016) with the attention-based neural encoder-decoder architecture (Luong et al., 2015). This algorithm (a) is well-designed for problems with a large action space and delayed rewards, (b) effectively optimizes traditional corpus-level machine translation metrics, and (c) is robust to skewed, high-variance, granular feedback modeled after actual human behaviors.- Anthology ID:
- D17-1153
- Volume:
- Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
- Month:
- September
- Year:
- 2017
- Address:
- Copenhagen, Denmark
- Editors:
- Martha Palmer, Rebecca Hwa, Sebastian Riedel
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1464–1474
- Language:
- URL:
- https://aclanthology.org/D17-1153
- DOI:
- 10.18653/v1/D17-1153
- Cite (ACL):
- Khanh Nguyen, Hal Daumé III, and Jordan Boyd-Graber. 2017. Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1464–1474, Copenhagen, Denmark. Association for Computational Linguistics.
- Cite (Informal):
- Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback (Nguyen et al., EMNLP 2017)
- PDF:
- https://preview.aclanthology.org/ingest-bitext-workshop/D17-1153.pdf
- Code
- khanhptnk/bandit-nmt