Abstract
Word embeddings are established as very effective models used in several NLP applications. If they differ in their architecture and training process, they often exhibit similar properties and remain vector space models with continuously-valued dimensions describing the observed data. The complexity resides in the developed strategies for learning the values within each dimensional space. In this paper, we introduce the concept of disruption which we define as a side effect of the training process of embedding models. Disruptions are viewed as a set of embedding values that are more likely to be noise than effective descriptive features. We show that dealing with disruption phenomenon is of a great benefit to bottom-up sentence embedding representation. By contrasting several in-domain and pre-trained embedding models, we propose two simple but very effective tweaking techniques that yield strong empirical improvements on textual similarity task.- Anthology ID:
- R19-1054
- Volume:
- Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
- Month:
- September
- Year:
- 2019
- Address:
- Varna, Bulgaria
- Editors:
- Ruslan Mitkov, Galia Angelova
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 460–464
- Language:
- URL:
- https://aclanthology.org/R19-1054
- DOI:
- 10.26615/978-954-452-056-4_054
- Cite (ACL):
- Amir Hazem and Nicolas Hernandez. 2019. Tweaks and Tricks for Word Embedding Disruptions. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 460–464, Varna, Bulgaria. INCOMA Ltd..
- Cite (Informal):
- Tweaks and Tricks for Word Embedding Disruptions (Hazem & Hernandez, RANLP 2019)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/R19-1054.pdf