Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks

Maxim Kodryan, Artem Grachev, Dmitry Ignatov, Dmitry Vetrov


Abstract
Reduction of the number of parameters is one of the most important goals in Deep Learning. In this article we propose an adaptation of Doubly Stochastic Variational Inference for Automatic Relevance Determination (DSVI-ARD) for neural networks compression. We find this method to be especially useful in language modeling tasks, where large number of parameters in the input and output layers is often excessive. We also show that DSVI-ARD can be applied together with encoder-decoder weight tying allowing to achieve even better sparsity and performance. Our experiments demonstrate that more than 90% of the weights in both encoder and decoder layers can be removed with a minimal quality loss.
Anthology ID:
W19-4306
Volume:
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Isabelle Augenstein, Spandana Gella, Sebastian Ruder, Katharina Kann, Burcu Can, Johannes Welbl, Alexis Conneau, Xiang Ren, Marek Rei
Venue:
RepL4NLP
SIG:
SIGREP
Publisher:
Association for Computational Linguistics
Note:
Pages:
40–48
Language:
URL:
https://aclanthology.org/W19-4306
DOI:
10.18653/v1/W19-4306
Bibkey:
Cite (ACL):
Maxim Kodryan, Artem Grachev, Dmitry Ignatov, and Dmitry Vetrov. 2019. Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pages 40–48, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks (Kodryan et al., RepL4NLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-3/W19-4306.pdf
Data
WikiText-2