Leveraging Word-Formation Knowledge for Chinese Word Sense Disambiguation

Hua Zheng, Lei Li, Damai Dai, Deli Chen, Tianyu Liu, Xu Sun, Yang Liu


Abstract
In parataxis languages like Chinese, word meanings are constructed using specific word-formations, which can help to disambiguate word senses. However, such knowledge is rarely explored in previous word sense disambiguation (WSD) methods. In this paper, we propose to leverage word-formation knowledge to enhance Chinese WSD. We first construct a large-scale Chinese lexical sample WSD dataset with word-formations. Then, we propose a model FormBERT to explicitly incorporate word-formations into sense disambiguation. To further enhance generalizability, we design a word-formation predictor module in case word-formation annotations are unavailable. Experimental results show that our method brings substantial performance improvement over strong baselines.
Anthology ID:
2021.findings-emnlp.78
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
918–923
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.78
DOI:
10.18653/v1/2021.findings-emnlp.78
Bibkey:
Cite (ACL):
Hua Zheng, Lei Li, Damai Dai, Deli Chen, Tianyu Liu, Xu Sun, and Yang Liu. 2021. Leveraging Word-Formation Knowledge for Chinese Word Sense Disambiguation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 918–923, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Leveraging Word-Formation Knowledge for Chinese Word Sense Disambiguation (Zheng et al., Findings 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2021.findings-emnlp.78.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-4/2021.findings-emnlp.78.mp4
Code
 tobiaslee/formbert