Abstract
For Chinese word segmentation, the large-scale annotated corpora mainly focus on newswire and only a handful of annotated data is available in other domains such as patents and literature. Considering the limited amount of annotated target domain data, it is a challenge for segmenters to learn domain-specific information while avoid getting over-fitted at the same time. In this paper, we propose a neural regularized domain adaptation method for Chinese word segmentation. The teacher networks trained in source domain are employed to regularize the training process of the student network by preserving the general knowledge. In the experiments, our neural regularized domain adaptation method achieves a better performance comparing to previous methods.- Anthology ID:
- W17-6002
- Volume:
- Proceedings of the 9th SIGHAN Workshop on Chinese Language Processing
- Month:
- December
- Year:
- 2017
- Address:
- Taiwan
- Venue:
- SIGHAN
- SIG:
- SIGHAN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 11–20
- Language:
- URL:
- https://aclanthology.org/W17-6002
- DOI:
- Cite (ACL):
- Zuyi Bao, Si Li, Weiran Xu, and Sheng Gao. 2017. Neural Regularized Domain Adaptation for Chinese Word Segmentation. In Proceedings of the 9th SIGHAN Workshop on Chinese Language Processing, pages 11–20, Taiwan. Association for Computational Linguistics.
- Cite (Informal):
- Neural Regularized Domain Adaptation for Chinese Word Segmentation (Bao et al., SIGHAN 2017)
- PDF:
- https://preview.aclanthology.org/auto-file-uploads/W17-6002.pdf