Abstract
In this work, we study the credit assignment problem in reward augmented maximum likelihood (RAML) learning, and establish a theoretical equivalence between the token-level counterpart of RAML and the entropy regularized reinforcement learning. Inspired by the connection, we propose two sequence prediction algorithms, one extending RAML with fine-grained credit assignment and the other improving Actor-Critic with a systematic entropy regularization. On two benchmark datasets, we show the proposed algorithms outperform RAML and Actor-Critic respectively, providing new alternatives to sequence prediction.- Anthology ID:
- P18-1155
- Volume:
- Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2018
- Address:
- Melbourne, Australia
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1672–1682
- Language:
- URL:
- https://aclanthology.org/P18-1155
- DOI:
- 10.18653/v1/P18-1155
- Cite (ACL):
- Zihang Dai, Qizhe Xie, and Eduard Hovy. 2018. From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1672–1682, Melbourne, Australia. Association for Computational Linguistics.
- Cite (Informal):
- From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction (Dai et al., ACL 2018)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/P18-1155.pdf
- Code
- zihangdai/ERAC-VAML
- Data
- COCO