Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction

Kazuma Hashimoto, Yoshimasa Tsuruoka


Abstract
A major obstacle in reinforcement learning-based sentence generation is the large action space whose size is equal to the vocabulary size of the target-side language. To improve the efficiency of reinforcement learning, we present a novel approach for reducing the action space based on dynamic vocabulary prediction. Our method first predicts a fixed-size small vocabulary for each input to generate its target sentence. The input-specific vocabularies are then used at supervised and reinforcement learning steps, and also at test time. In our experiments on six machine translation and two image captioning datasets, our method achieves faster reinforcement learning (~2.7x faster) with less GPU memory (~2.3x less) than the full-vocabulary counterpart. We also show that our method more effectively receives rewards with fewer iterations of supervised pre-training.
Anthology ID:
N19-1315
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3115–3125
Language:
URL:
https://aclanthology.org/N19-1315
DOI:
10.18653/v1/N19-1315
Bibkey:
Cite (ACL):
Kazuma Hashimoto and Yoshimasa Tsuruoka. 2019. Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3115–3125, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction (Hashimoto & Tsuruoka, NAACL 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/N19-1315.pdf
Supplementary:
 N19-1315.Supplementary.pdf
Video:
 https://vimeo.com/356125366
Code
 hassyGo/NLG-RL
Data
ASPECCOCO