Addressing Domain Adaptation for Chinese Word Segmentation with Global Recurrent Structure

Shen Huang, Xu Sun, Houfeng Wang


Abstract
Boundary features are widely used in traditional Chinese Word Segmentation (CWS) methods as they can utilize unlabeled data to help improve the Out-of-Vocabulary (OOV) word recognition performance. Although various neural network methods for CWS have achieved performance competitive with state-of-the-art systems, these methods, constrained by the domain and size of the training corpus, do not work well in domain adaptation. In this paper, we propose a novel BLSTM-based neural network model which incorporates a global recurrent structure designed for modeling boundary features dynamically. Experiments show that the proposed structure can effectively boost the performance of Chinese Word Segmentation, especially OOV-Recall, which brings benefits to domain adaptation. We achieved state-of-the-art results on 6 domains of CNKI articles, and competitive results to the best reported on the 4 domains of SIGHAN Bakeoff 2010 data.
Anthology ID:
I17-1019
Volume:
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
November
Year:
2017
Address:
Taipei, Taiwan
Editors:
Greg Kondrak, Taro Watanabe
Venue:
IJCNLP
SIG:
Publisher:
Asian Federation of Natural Language Processing
Note:
Pages:
184–193
Language:
URL:
https://aclanthology.org/I17-1019
DOI:
Bibkey:
Cite (ACL):
Shen Huang, Xu Sun, and Houfeng Wang. 2017. Addressing Domain Adaptation for Chinese Word Segmentation with Global Recurrent Structure. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 184–193, Taipei, Taiwan. Asian Federation of Natural Language Processing.
Cite (Informal):
Addressing Domain Adaptation for Chinese Word Segmentation with Global Recurrent Structure (Huang et al., IJCNLP 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/I17-1019.pdf