Neural Regularized Domain Adaptation for Chinese Word Segmentation

Zuyi Bao, Si Li, Weiran Xu, Sheng Gao


Abstract
For Chinese word segmentation, the large-scale annotated corpora mainly focus on newswire and only a handful of annotated data is available in other domains such as patents and literature. Considering the limited amount of annotated target domain data, it is a challenge for segmenters to learn domain-specific information while avoid getting over-fitted at the same time. In this paper, we propose a neural regularized domain adaptation method for Chinese word segmentation. The teacher networks trained in source domain are employed to regularize the training process of the student network by preserving the general knowledge. In the experiments, our neural regularized domain adaptation method achieves a better performance comparing to previous methods.
Anthology ID:
W17-6002
Volume:
Proceedings of the 9th SIGHAN Workshop on Chinese Language Processing
Month:
December
Year:
2017
Address:
Taiwan
Venue:
SIGHAN
SIG:
SIGHAN
Publisher:
Association for Computational Linguistics
Note:
Pages:
11–20
Language:
URL:
https://aclanthology.org/W17-6002
DOI:
Bibkey:
Cite (ACL):
Zuyi Bao, Si Li, Weiran Xu, and Sheng Gao. 2017. Neural Regularized Domain Adaptation for Chinese Word Segmentation. In Proceedings of the 9th SIGHAN Workshop on Chinese Language Processing, pages 11–20, Taiwan. Association for Computational Linguistics.
Cite (Informal):
Neural Regularized Domain Adaptation for Chinese Word Segmentation (Bao et al., SIGHAN 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/W17-6002.pdf