Consistent Word Segmentation, Part-of-Speech Tagging and Dependency Labelling Annotation for Chinese Language

Mo Shen, Wingmui Li, HyunJeong Choe, Chenhui Chu, Daisuke Kawahara, Sadao Kurohashi


Abstract
In this paper, we propose a new annotation approach to Chinese word segmentation, part-of-speech (POS) tagging and dependency labelling that aims to overcome the two major issues in traditional morphology-based annotation: Inconsistency and data sparsity. We re-annotate the Penn Chinese Treebank 5.0 (CTB5) and demonstrate the advantages of this approach compared to the original CTB5 annotation through word segmentation, POS tagging and machine translation experiments.
Anthology ID:
C16-1029
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Yuji Matsumoto, Rashmi Prasad
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
298–308
Language:
URL:
https://aclanthology.org/C16-1029
DOI:
Bibkey:
Cite (ACL):
Mo Shen, Wingmui Li, HyunJeong Choe, Chenhui Chu, Daisuke Kawahara, and Sadao Kurohashi. 2016. Consistent Word Segmentation, Part-of-Speech Tagging and Dependency Labelling Annotation for Chinese Language. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 298–308, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Consistent Word Segmentation, Part-of-Speech Tagging and Dependency Labelling Annotation for Chinese Language (Shen et al., COLING 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-3/C16-1029.pdf