GGP: Glossary Guided Post-processing for Word Embedding Learning

Ruosong Yang; Jiannong Cao; Zhiyuan Wen

GGP: Glossary Guided Post-processing for Word Embedding Learning

Abstract

Word embedding learning is the task to map each word into a low-dimensional and continuous vector based on a large corpus. To enhance corpus based word embedding models, researchers utilize domain knowledge to learn more distinguishable representations via joint optimization and post-processing based models. However, joint optimization based models require much training time. Existing post-processing models mostly consider semantic knowledge while learned embedding models show less functional information. Glossary is a comprehensive linguistic resource. And in previous works, the glossary is usually used to enhance the word representations via joint optimization based methods. In this paper, we post-process pre-trained word embedding models with incorporating the glossary and capture more topical and functional information. We propose GGP (Glossary Guided Post-processing word embedding) model which consists of a global post-processing function to fine-tune each word vector, and an auto-encoding model to learn sense representations, furthermore, constrains each post-processed word representation and the composition of its sense representations to be similar. We evaluate our model by comparing it with two state-of-the-art models on six word topical/functional similarity datasets, and the results show that it outperforms competitors by an average of 4.1% across all datasets. And our model outperforms GloVe by more than 7%.

Anthology ID:: 2020.lrec-1.581
Volume:: Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:: May
Year:: 2020
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 4726–4730
Language:: English
URL:: https://aclanthology.org/2020.lrec-1.581
DOI:
Bibkey:
Cite (ACL):: Ruosong Yang, Jiannong Cao, and Zhiyuan Wen. 2020. GGP: Glossary Guided Post-processing for Word Embedding Learning. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4726–4730, Marseille, France. European Language Resources Association.
Cite (Informal):: GGP: Glossary Guided Post-processing for Word Embedding Learning (Yang et al., LREC 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/naacl24-info/2020.lrec-1.581.pdf

PDF Search