Abstract
In this paper, we describe our hybrid parsing model on the Mandarin Chinese processing. In particular, we work on the Tsinghua Chinese Treebank (TCT), whose annotation has both constitutes and the head information of each constitute. The model we design combines the mainstream constitute parsing and dependency parsing. We present in detail 1) how to (partially) encode the head information into the constitute parsing, 2) how to encode constitute information into the dependency parsing, and 3) how to restore the head information using the dependency structure. For each of them, we take different strategies to deal with different cases. In an open shared task evaluation, we achieve an f1-score of 85.23% for the constitute parsing, 82.35% with partial head information, and 74.27% with complete head information. The error analysis shows the challenge of restoring multiple-headed constitutes and also some potentials to use the dependency structure to guide the constitute parsing, which will be our future work to explore.- Anthology ID:
- L10-1583
- Volume:
- Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
- Month:
- May
- Year:
- 2010
- Address:
- Valletta, Malta
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2010/pdf/844_Paper.pdf
- DOI:
- Cite (ACL):
- Rui Wang and Yi Zhang. 2010. Hybrid Constituent and Dependency Parsing with Tsinghua Chinese Treebank. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
- Cite (Informal):
- Hybrid Constituent and Dependency Parsing with Tsinghua Chinese Treebank (Wang & Zhang, LREC 2010)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2010/pdf/844_Paper.pdf