Using Context-to-Vector with Graph Retrofitting to Improve Word Embeddings

Jiangbin Zheng, Yile Wang, Ge Wang, Jun Xia, Yufei Huang, Guojiang Zhao, Yue Zhang, Stan Li


Abstract
Although contextualized embeddings generated from large-scale pre-trained models perform well in many tasks, traditional static embeddings (e.g., Skip-gram, Word2Vec) still play an important role in low-resource and lightweight settings due to their low computational cost, ease of deployment, and stability. In this paper, we aim to improve word embeddings by 1) incorporating more contextual information from existing pre-trained models into the Skip-gram framework, which we call Context-to-Vec; 2) proposing a post-processing retrofitting method for static embeddings independent of training by employing priori synonym knowledge and weighted vector distribution. Through extrinsic and intrinsic tasks, our methods are well proven to outperform the baselines by a large margin.
Anthology ID:
2022.acl-long.561
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8154–8163
Language:
URL:
https://aclanthology.org/2022.acl-long.561
DOI:
10.18653/v1/2022.acl-long.561
Bibkey:
Cite (ACL):
Jiangbin Zheng, Yile Wang, Ge Wang, Jun Xia, Yufei Huang, Guojiang Zhao, Yue Zhang, and Stan Li. 2022. Using Context-to-Vector with Graph Retrofitting to Improve Word Embeddings. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8154–8163, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Using Context-to-Vector with Graph Retrofitting to Improve Word Embeddings (Zheng et al., ACL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.acl-long.561.pdf
Code
 binbinjiang/context2vector