From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction

Zihang Dai; Qizhe Xie; Eduard Hovy

doi:10.18653/v1/P18-1155

From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction

Abstract

In this work, we study the credit assignment problem in reward augmented maximum likelihood (RAML) learning, and establish a theoretical equivalence between the token-level counterpart of RAML and the entropy regularized reinforcement learning. Inspired by the connection, we propose two sequence prediction algorithms, one extending RAML with fine-grained credit assignment and the other improving Actor-Critic with a systematic entropy regularization. On two benchmark datasets, we show the proposed algorithms outperform RAML and Actor-Critic respectively, providing new alternatives to sequence prediction.

Anthology ID:: P18-1155
Volume:: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2018
Address:: Melbourne, Australia
Editors:: Iryna Gurevych, Yusuke Miyao
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1672–1682
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/P18-1155/
DOI:: 10.18653/v1/P18-1155
Bibkey:
Cite (ACL):: Zihang Dai, Qizhe Xie, and Eduard Hovy. 2018. From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1672–1682, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):: From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction (Dai et al., ACL 2018)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/P18-1155.pdf
Note:: P18-1155.Notes.pdf
Poster:: P18-1155.Poster.pdf
Code: zihangdai/ERAC-VAML
Data: MS COCO

PDF Cite Search Code Note Poster Fix data