Abstract
Sememes are minimum semantic units of word meanings, and the meaning of each word sense is typically composed by several sememes. Since sememes are not explicit for each word, people manually annotate word sememes and form linguistic common-sense knowledge bases. In this paper, we present that, word sememe information can improve word representation learning (WRL), which maps words into a low-dimensional semantic space and serves as a fundamental step for many NLP tasks. The key idea is to utilize word sememes to capture exact meanings of a word within specific contexts accurately. More specifically, we follow the framework of Skip-gram and present three sememe-encoded models to learn representations of sememes, senses and words, where we apply the attention scheme to detect word senses in various contexts. We conduct experiments on two tasks including word similarity and word analogy, and our models significantly outperform baselines. The results indicate that WRL can benefit from sememes via the attention scheme, and also confirm our models being capable of correctly modeling sememe information.- Anthology ID:
- P17-1187
- Volume:
- Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2017
- Address:
- Vancouver, Canada
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2049–2058
- Language:
- URL:
- https://aclanthology.org/P17-1187
- DOI:
- 10.18653/v1/P17-1187
- Cite (ACL):
- Yilin Niu, Ruobing Xie, Zhiyuan Liu, and Maosong Sun. 2017. Improved Word Representation Learning with Sememes. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2049–2058, Vancouver, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Improved Word Representation Learning with Sememes (Niu et al., ACL 2017)
- PDF:
- https://preview.aclanthology.org/starsem-semeval-split/P17-1187.pdf
- Code
- thunlp/SE-WRL