Abstract
Chinese word segmentation (CWS) trained from open source corpus faces dramatic performance drop when dealing with domain text, especially for a domain with lots of special terms and diverse writing styles, such as the biomedical domain. However, building domain-specific CWS requires extremely high annotation cost. In this paper, we propose an approach by exploiting domain-invariant knowledge from high resource to low resource domains. Extensive experiments show that our model achieves consistently higher accuracy than the single-task CWS and other transfer learning baselines, especially when there is a large disparity between source and target domains.- Anthology ID:
- C18-1307
- Volume:
- Proceedings of the 27th International Conference on Computational Linguistics
- Month:
- August
- Year:
- 2018
- Address:
- Santa Fe, New Mexico, USA
- Editors:
- Emily M. Bender, Leon Derczynski, Pierre Isabelle
- Venue:
- COLING
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3619–3630
- Language:
- URL:
- https://aclanthology.org/C18-1307
- DOI:
- Cite (ACL):
- Junjie Xing, Kenny Zhu, and Shaodian Zhang. 2018. Adaptive Multi-Task Transfer Learning for Chinese Word Segmentation in Medical Text. In Proceedings of the 27th International Conference on Computational Linguistics, pages 3619–3630, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
- Cite (Informal):
- Adaptive Multi-Task Transfer Learning for Chinese Word Segmentation in Medical Text (Xing et al., COLING 2018)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/C18-1307.pdf
- Code
- adapt-sjtu/AMTTL