Abstract
Word-level information is important in natural language processing (NLP), especially for the Chinese language due to its high linguistic complexity. Chinese word segmentation (CWS) is an essential task for Chinese downstream NLP tasks. Existing methods have already achieved a competitive performance for CWS on large-scale annotated corpora. However, the accuracy of the method will drop dramatically when it handles an unsegmented text with lots of out-of-vocabulary (OOV) words. In addition, there are many different segmentation criteria for addressing different requirements of downstream NLP tasks. Excessive amounts of models with saving different criteria will generate the explosive growth of the total parameters. To this end, we propose a joint multiple criteria model that shares all parameters to integrate different segmentation criteria into one model. Besides, we utilize a transfer learning method to improve the performance of OOV words. Our proposed method is evaluated by designing comprehensive experiments on multiple benchmark datasets (e.g., Bakeoff 2005, Bakeoff 2008 and SIGHAN 2010). Our method achieves the state-of-the-art performances on all datasets. Importantly, our method also shows a competitive practicability and generalization ability for the CWS task.- Anthology ID:
- 2020.emnlp-main.318
- Volume:
- Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Editors:
- Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3873–3882
- Language:
- URL:
- https://aclanthology.org/2020.emnlp-main.318
- DOI:
- 10.18653/v1/2020.emnlp-main.318
- Cite (ACL):
- Kaiyu Huang, Degen Huang, Zhuang Liu, and Fengran Mo. 2020. A Joint Multiple Criteria Model in Transfer Learning for Cross-domain Chinese Word Segmentation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3873–3882, Online. Association for Computational Linguistics.
- Cite (Informal):
- A Joint Multiple Criteria Model in Transfer Learning for Cross-domain Chinese Word Segmentation (Huang et al., EMNLP 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2020.emnlp-main.318.pdf