Try to Substitute: An Unsupervised Chinese Word Sense Disambiguation Method Based on HowNet
Bairu Hou, Fanchao Qi, Yuan Zang, Xurui Zhang, Zhiyuan Liu, Maosong Sun
Abstract
Word sense disambiguation (WSD) is a fundamental natural language processing task. Unsupervised knowledge-based WSD only relies on a lexical knowledge base as the sense inventory and has wider practical use than supervised WSD that requires a mass of sense-annotated data. HowNet is the most widely used lexical knowledge base in Chinese WSD. Because of its uniqueness, however, most of existing unsupervised WSD methods cannot work for HowNet-based WSD, and the tailor-made methods have not obtained satisfying results. In this paper, we propose a new unsupervised method for HowNet-based Chinese WSD, which exploits the masked language model task of pre-trained language models. In experiments, considering existing evaluation dataset is small and out-of-date, we build a new and larger HowNet-based WSD dataset. Experimental results demonstrate that our model achieves significantly better performance than all the baseline methods. All the code and data of this paper are available at https://github.com/thunlp/SememeWSD.- Anthology ID:
- 2020.coling-main.155
- Volume:
- Proceedings of the 28th International Conference on Computational Linguistics
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Editors:
- Donia Scott, Nuria Bel, Chengqing Zong
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 1752–1757
- Language:
- URL:
- https://aclanthology.org/2020.coling-main.155
- DOI:
- 10.18653/v1/2020.coling-main.155
- Cite (ACL):
- Bairu Hou, Fanchao Qi, Yuan Zang, Xurui Zhang, Zhiyuan Liu, and Maosong Sun. 2020. Try to Substitute: An Unsupervised Chinese Word Sense Disambiguation Method Based on HowNet. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1752–1757, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Cite (Informal):
- Try to Substitute: An Unsupervised Chinese Word Sense Disambiguation Method Based on HowNet (Hou et al., COLING 2020)
- PDF:
- https://preview.aclanthology.org/emnlp-22-attachments/2020.coling-main.155.pdf
- Code
- thunlp/sememewsd