On Dimensional Linguistic Properties of the Word Embedding Space

Vikas Raunak, Vaibhav Kumar, Vivek Gupta, Florian Metze


Abstract
Word embeddings have become a staple of several natural language processing tasks, yet much remains to be understood about their properties. In this work, we analyze word embeddings in terms of their principal components and arrive at a number of novel and counterintuitive observations. In particular, we characterize the utility of variance explained by the principal components as a proxy for downstream performance. Furthermore, through syntactic probing of the principal embedding space, we show that the syntactic information captured by a principal component does not correlate with the amount of variance it explains. Consequently, we investigate the limitations of variance based embedding post-processing algorithms and demonstrate that such post-processing is counter-productive in sentence classification and machine translation tasks. Finally, we offer a few precautionary guidelines on applying variance based embedding post-processing and explain why non-isotropic geometry might be integral to word embedding performance.
Anthology ID:
2020.repl4nlp-1.19
Volume:
Proceedings of the 5th Workshop on Representation Learning for NLP
Month:
July
Year:
2020
Address:
Online
Editors:
Spandana Gella, Johannes Welbl, Marek Rei, Fabio Petroni, Patrick Lewis, Emma Strubell, Minjoon Seo, Hannaneh Hajishirzi
Venue:
RepL4NLP
SIG:
SIGREP
Publisher:
Association for Computational Linguistics
Note:
Pages:
156–165
Language:
URL:
https://aclanthology.org/2020.repl4nlp-1.19
DOI:
10.18653/v1/2020.repl4nlp-1.19
Bibkey:
Cite (ACL):
Vikas Raunak, Vaibhav Kumar, Vivek Gupta, and Florian Metze. 2020. On Dimensional Linguistic Properties of the Word Embedding Space. In Proceedings of the 5th Workshop on Representation Learning for NLP, pages 156–165, Online. Association for Computational Linguistics.
Cite (Informal):
On Dimensional Linguistic Properties of the Word Embedding Space (Raunak et al., RepL4NLP 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2020.repl4nlp-1.19.pdf
Software:
 2020.repl4nlp-1.19.Software.zip
Video:
 http://slideslive.com/38929785
Code
 vyraun/dlp +  additional community code