From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction

Zihang Dai, Qizhe Xie, Eduard Hovy

[How to correct problems with metadata yourself]


Abstract
In this work, we study the credit assignment problem in reward augmented maximum likelihood (RAML) learning, and establish a theoretical equivalence between the token-level counterpart of RAML and the entropy regularized reinforcement learning. Inspired by the connection, we propose two sequence prediction algorithms, one extending RAML with fine-grained credit assignment and the other improving Actor-Critic with a systematic entropy regularization. On two benchmark datasets, we show the proposed algorithms outperform RAML and Actor-Critic respectively, providing new alternatives to sequence prediction.
Anthology ID:
P18-1155
Volume:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Iryna Gurevych, Yusuke Miyao
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1672–1682
Language:
URL:
https://aclanthology.org/P18-1155
DOI:
10.18653/v1/P18-1155
Bibkey:
Cite (ACL):
Zihang Dai, Qizhe Xie, and Eduard Hovy. 2018. From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1672–1682, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction (Dai et al., ACL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/teach-a-man-to-fish/P18-1155.pdf
Note:
 P18-1155.Notes.pdf
Poster:
 P18-1155.Poster.pdf
Code
 zihangdai/ERAC-VAML
Data
MS COCO