Glyph Enhanced Chinese Character Pre-Training for Lexical Sememe Prediction

Boer Lyu, Lu Chen, Kai Yu


Abstract
Sememes are defined as the atomic units to describe the semantic meaning of concepts. Due to the difficulty of manually annotating sememes and the inconsistency of annotations between experts, the lexical sememe prediction task has been proposed. However, previous methods heavily rely on word or character embeddings, and ignore the fine-grained information. In this paper, we propose a novel pre-training method which is designed to better incorporate the internal information of Chinese character. The Glyph enhanced Chinese Character representation (GCC) is used to assist sememe prediction. We experiment and evaluate our model on HowNet, which is a famous sememe knowledge base. The experimental results show that our method outperforms existing non-external information models.
Anthology ID:
2021.findings-emnlp.386
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
4549–4555
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.386
DOI:
10.18653/v1/2021.findings-emnlp.386
Bibkey:
Cite (ACL):
Boer Lyu, Lu Chen, and Kai Yu. 2021. Glyph Enhanced Chinese Character Pre-Training for Lexical Sememe Prediction. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4549–4555, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Glyph Enhanced Chinese Character Pre-Training for Lexical Sememe Prediction (Lyu et al., Findings 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp22-frontmatter/2021.findings-emnlp.386.pdf
Video:
 https://preview.aclanthology.org/emnlp22-frontmatter/2021.findings-emnlp.386.mp4
Code
 lbe0613/gcc