Abstract
Boundary features are widely used in traditional Chinese Word Segmentation (CWS) methods as they can utilize unlabeled data to help improve the Out-of-Vocabulary (OOV) word recognition performance. Although various neural network methods for CWS have achieved performance competitive with state-of-the-art systems, these methods, constrained by the domain and size of the training corpus, do not work well in domain adaptation. In this paper, we propose a novel BLSTM-based neural network model which incorporates a global recurrent structure designed for modeling boundary features dynamically. Experiments show that the proposed structure can effectively boost the performance of Chinese Word Segmentation, especially OOV-Recall, which brings benefits to domain adaptation. We achieved state-of-the-art results on 6 domains of CNKI articles, and competitive results to the best reported on the 4 domains of SIGHAN Bakeoff 2010 data.- Anthology ID:
- I17-1019
- Volume:
- Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
- Month:
- November
- Year:
- 2017
- Address:
- Taipei, Taiwan
- Editors:
- Greg Kondrak, Taro Watanabe
- Venue:
- IJCNLP
- SIG:
- Publisher:
- Asian Federation of Natural Language Processing
- Note:
- Pages:
- 184–193
- Language:
- URL:
- https://aclanthology.org/I17-1019
- DOI:
- Cite (ACL):
- Shen Huang, Xu Sun, and Houfeng Wang. 2017. Addressing Domain Adaptation for Chinese Word Segmentation with Global Recurrent Structure. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 184–193, Taipei, Taiwan. Asian Federation of Natural Language Processing.
- Cite (Informal):
- Addressing Domain Adaptation for Chinese Word Segmentation with Global Recurrent Structure (Huang et al., IJCNLP 2017)
- PDF:
- https://preview.aclanthology.org/naacl-24-ws-corrections/I17-1019.pdf