Skeleton Parsing in Chinese: Annotation Scheme and Guidelines

May Lai-Yin Wong


Abstract
This paper presents my manual skeleton parsing on a sample text of approximately 100,000 word tokens (or about 2,500 sentences) taken from the PFR Chinese Corpus with a clearly defined parsing scheme of 17 constituent labels. The manually-parsed sample skeleton treebank is one of the very few extant Chinese treebanks. While Chinese part-of-speech tagging and word segmentation have been the subject of concerted research for many years, the syntactic annotation of Chinese corpora is a comparatively new field. The difficulties that I encountered in the production of this treebank demonstrate some of the peculiarities of Chinese syntax. A noteworthy syntactic property is that some serial verb constructions tend to be used as if they were compound verbs. The two transitive verbs in series, unlike common transitive verbs, do not take an object separately within the construction; rather, the serial construction as a whole is able to take the same direct object and the perfective aspect marker le. The skeleton-parsed sample treebank is evaluated against Eyes & Leech (1993)’s criteria and proves to be accurate, uniform and linguistically valid.
Anthology ID:
L06-1022
Volume:
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:
May
Year:
2006
Address:
Genoa, Italy
Editors:
Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/50_pdf.pdf
DOI:
Bibkey:
Cite (ACL):
May Lai-Yin Wong. 2006. Skeleton Parsing in Chinese: Annotation Scheme and Guidelines. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):
Skeleton Parsing in Chinese: Annotation Scheme and Guidelines (Wong, LREC 2006)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/50_pdf.pdf