On Learning Word Embeddings From Linguistically Augmented Text Corpora

Amila Silva, Chathurika Amarathunga


Abstract
Word embedding is a technique in Natural Language Processing (NLP) to map words into vector space representations. Since it has boosted the performance of many NLP downstream tasks, the task of learning word embeddings has been addressing significantly. Nevertheless, most of the underlying word embedding methods such as word2vec and GloVe fail to produce high-quality embeddings if the text corpus is small and sparse. This paper proposes a method to generate effective word embeddings from limited data. Through experiments, we show that our proposed model outperforms existing works for the classical word similarity task and for a domain-specific application.
Anthology ID:
W19-0508
Volume:
Proceedings of the 13th International Conference on Computational Semantics - Short Papers
Month:
May
Year:
2019
Address:
Gothenburg, Sweden
Editors:
Simon Dobnik, Stergios Chatzikyriakidis, Vera Demberg
Venue:
IWCS
SIG:
SIGSEM
Publisher:
Association for Computational Linguistics
Note:
Pages:
52–58
Language:
URL:
https://aclanthology.org/W19-0508
DOI:
10.18653/v1/W19-0508
Bibkey:
Cite (ACL):
Amila Silva and Chathurika Amarathunga. 2019. On Learning Word Embeddings From Linguistically Augmented Text Corpora. In Proceedings of the 13th International Conference on Computational Semantics - Short Papers, pages 52–58, Gothenburg, Sweden. Association for Computational Linguistics.
Cite (Informal):
On Learning Word Embeddings From Linguistically Augmented Text Corpora (Silva & Amarathunga, IWCS 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/W19-0508.pdf