Tweaks and Tricks for Word Embedding Disruptions

Amir Hazem, Nicolas Hernandez


Abstract
Word embeddings are established as very effective models used in several NLP applications. If they differ in their architecture and training process, they often exhibit similar properties and remain vector space models with continuously-valued dimensions describing the observed data. The complexity resides in the developed strategies for learning the values within each dimensional space. In this paper, we introduce the concept of disruption which we define as a side effect of the training process of embedding models. Disruptions are viewed as a set of embedding values that are more likely to be noise than effective descriptive features. We show that dealing with disruption phenomenon is of a great benefit to bottom-up sentence embedding representation. By contrasting several in-domain and pre-trained embedding models, we propose two simple but very effective tweaking techniques that yield strong empirical improvements on textual similarity task.
Anthology ID:
R19-1054
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
460–464
Language:
URL:
https://aclanthology.org/R19-1054
DOI:
10.26615/978-954-452-056-4_054
Bibkey:
Cite (ACL):
Amir Hazem and Nicolas Hernandez. 2019. Tweaks and Tricks for Word Embedding Disruptions. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 460–464, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Tweaks and Tricks for Word Embedding Disruptions (Hazem & Hernandez, RANLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/paclic-22-ingestion/R19-1054.pdf