Abstract
Word embedding learning is the task to map each word into a low-dimensional and continuous vector based on a large corpus. To enhance corpus based word embedding models, researchers utilize domain knowledge to learn more distinguishable representations via joint optimization and post-processing based models. However, joint optimization based models require much training time. Existing post-processing models mostly consider semantic knowledge while learned embedding models show less functional information. Glossary is a comprehensive linguistic resource. And in previous works, the glossary is usually used to enhance the word representations via joint optimization based methods. In this paper, we post-process pre-trained word embedding models with incorporating the glossary and capture more topical and functional information. We propose GGP (Glossary Guided Post-processing word embedding) model which consists of a global post-processing function to fine-tune each word vector, and an auto-encoding model to learn sense representations, furthermore, constrains each post-processed word representation and the composition of its sense representations to be similar. We evaluate our model by comparing it with two state-of-the-art models on six word topical/functional similarity datasets, and the results show that it outperforms competitors by an average of 4.1% across all datasets. And our model outperforms GloVe by more than 7%.- Anthology ID:
- 2020.lrec-1.581
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 4726–4730
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.581
- DOI:
- Cite (ACL):
- Ruosong Yang, Jiannong Cao, and Zhiyuan Wen. 2020. GGP: Glossary Guided Post-processing for Word Embedding Learning. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4726–4730, Marseille, France. European Language Resources Association.
- Cite (Informal):
- GGP: Glossary Guided Post-processing for Word Embedding Learning (Yang et al., LREC 2020)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2020.lrec-1.581.pdf