Learning Embeddings for Rare Words Leveraging Internet Search Engine and Spatial Location Relationships

Xiaotao Li, Shujuan You, Yawen Niu, Wai Chen


Abstract
Word embedding techniques depend heavily on the frequencies of words in the corpus, and are negatively impacted by failures in providing reliable representations for low-frequency words or unseen words during training. To address this problem, we propose an algorithm to learn embeddings for rare words based on an Internet search engine and the spatial location relationships. Our algorithm proceeds in two steps. We firstly retrieve webpages corresponding to the rare word through the search engine and parse the returned results to extract a set of most related words. We average the vectors of the related words as the initial vector of the rare word. Then, the location of the rare word in the vector space is iteratively fine-tuned according to the order of its relevances to the related words. Compared to other approaches, our algorithm can learn more accurate representations for a wider range of vocabulary. We evaluate our learned rare-word embeddings on the word relatedness task, and the experimental results show that our algorithm achieves state-of-the-art performance.
Anthology ID:
2021.starsem-1.26
Volume:
Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics
Month:
August
Year:
2021
Address:
Online
Venue:
*SEM
SIG:
SIGSEM
Publisher:
Association for Computational Linguistics
Note:
Pages:
278–287
Language:
URL:
https://aclanthology.org/2021.starsem-1.26
DOI:
10.18653/v1/2021.starsem-1.26
Bibkey:
Cite (ACL):
Xiaotao Li, Shujuan You, Yawen Niu, and Wai Chen. 2021. Learning Embeddings for Rare Words Leveraging Internet Search Engine and Spatial Location Relationships. In Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics, pages 278–287, Online. Association for Computational Linguistics.
Cite (Informal):
Learning Embeddings for Rare Words Leveraging Internet Search Engine and Spatial Location Relationships (Li et al., *SEM 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.starsem-1.26.pdf