Using the Output Embedding to Improve Language Models

Ofir Press, Lior Wolf


Abstract
We study the topmost weight matrix of neural network language models. We show that this matrix constitutes a valid word embedding. When training language models, we recommend tying the input embedding and this output embedding. We analyze the resulting update rules and show that the tied embedding evolves in a more similar way to the output embedding than to the input embedding in the untied model. We also offer a new method of regularizing the output embedding. Our methods lead to a significant reduction in perplexity, as we are able to show on a variety of neural network language models. Finally, we show that weight tying can reduce the size of neural translation models to less than half of their original size without harming their performance.
Anthology ID:
E17-2025
Volume:
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Mirella Lapata, Phil Blunsom, Alexander Koller
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
157–163
Language:
URL:
https://aclanthology.org/E17-2025
DOI:
Bibkey:
Cite (ACL):
Ofir Press and Lior Wolf. 2017. Using the Output Embedding to Improve Language Models. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 157–163, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Using the Output Embedding to Improve Language Models (Press & Wolf, EACL 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp22-frontmatter/E17-2025.pdf
Code
 ofirpress/UsingTheOutputEmbedding +  additional community code
Data
IMDb Movie Reviews