Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling

Zhe Gan, Chunyuan Li, Changyou Chen, Yunchen Pu, Qinliang Su, Lawrence Carin


Abstract
Recurrent neural networks (RNNs) have shown promising performance for language modeling. However, traditional training of RNNs using back-propagation through time often suffers from overfitting. One reason for this is that stochastic optimization (used for large training sets) does not provide good estimates of model uncertainty. This paper leverages recent advances in stochastic gradient Markov Chain Monte Carlo (also appropriate for large training sets) to learn weight uncertainty in RNNs. It yields a principled Bayesian learning algorithm, adding gradient noise during training (enhancing exploration of the model-parameter space) and model averaging when testing. Extensive experiments on various RNN models and across a broad range of applications demonstrate the superiority of the proposed approach relative to stochastic optimization.
Anthology ID:
P17-1030
Volume:
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2017
Address:
Vancouver, Canada
Editors:
Regina Barzilay, Min-Yen Kan
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
321–331
Language:
URL:
https://aclanthology.org/P17-1030
DOI:
10.18653/v1/P17-1030
Bibkey:
Cite (ACL):
Zhe Gan, Chunyuan Li, Changyou Chen, Yunchen Pu, Qinliang Su, and Lawrence Carin. 2017. Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 321–331, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling (Gan et al., ACL 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-dup-bibkey/P17-1030.pdf
Note:
 P17-1030.Notes.pdf
Video:
 https://preview.aclanthology.org/fix-dup-bibkey/P17-1030.mp4
Data
Flickr30kMPQA Opinion CorpusPenn Treebank