Consistent Word Segmentation, Part-of-Speech Tagging and Dependency Labelling Annotation for Chinese Language
Mo Shen, Wingmui Li, HyunJeong Choe, Chenhui Chu, Daisuke Kawahara, Sadao Kurohashi
Abstract
In this paper, we propose a new annotation approach to Chinese word segmentation, part-of-speech (POS) tagging and dependency labelling that aims to overcome the two major issues in traditional morphology-based annotation: Inconsistency and data sparsity. We re-annotate the Penn Chinese Treebank 5.0 (CTB5) and demonstrate the advantages of this approach compared to the original CTB5 annotation through word segmentation, POS tagging and machine translation experiments.- Anthology ID:
- C16-1029
- Volume:
- Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
- Month:
- December
- Year:
- 2016
- Address:
- Osaka, Japan
- Editors:
- Yuji Matsumoto, Rashmi Prasad
- Venue:
- COLING
- SIG:
- Publisher:
- The COLING 2016 Organizing Committee
- Note:
- Pages:
- 298–308
- Language:
- URL:
- https://aclanthology.org/C16-1029
- DOI:
- Cite (ACL):
- Mo Shen, Wingmui Li, HyunJeong Choe, Chenhui Chu, Daisuke Kawahara, and Sadao Kurohashi. 2016. Consistent Word Segmentation, Part-of-Speech Tagging and Dependency Labelling Annotation for Chinese Language. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 298–308, Osaka, Japan. The COLING 2016 Organizing Committee.
- Cite (Informal):
- Consistent Word Segmentation, Part-of-Speech Tagging and Dependency Labelling Annotation for Chinese Language (Shen et al., COLING 2016)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/C16-1029.pdf