Abstract
We construct a case-based English-to-Chinese semantic constituent parallel Treebank for a Statistical Machine Translation (SMT) task by labelling each node of the Deep Syntactic Tree (DST) with our refined semantic cases. Since subtree span-crossing is harmful in tree-based SMT, DST is adopted to alleviate this problem. At the same time, we tailor an existing case set to represent bilingual shallow semantic relations more precisely. This Treebank is a part of a semantic corpus building project, which aims to build a semantic bilingual corpus annotated with syntactic, semantic cases and word senses. Data in our Treebank is from the news domain of Datum corpus. 4,000 sentence pairs are selected to cover various lexicons and part-of-speech (POS) n-gram patterns as much as possible. This paper presents the construction of this case Treebank. Also, we have tested the effect of adopting DST structure in alleviating subtree span-crossing. Our preliminary analysis shows that the compatibility between Chinese and English trees can be significantly increased by transforming the parse-tree into the DST. Furthermore, the human agreement rate in annotation is found to be acceptable (90% in English nodes, 75% in Chinese nodes).- Anthology ID:
- L16-1466
- Volume:
- Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
- Month:
- May
- Year:
- 2016
- Address:
- Portorož, Slovenia
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 2918–2924
- Language:
- URL:
- https://aclanthology.org/L16-1466
- DOI:
- Cite (ACL):
- Huaxing Shi, Tiejun Zhao, and Keh-Yih Su. 2016. Building A Case-based Semantic English-Chinese Parallel Treebank. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2918–2924, Portorož, Slovenia. European Language Resources Association (ELRA).
- Cite (Informal):
- Building A Case-based Semantic English-Chinese Parallel Treebank (Shi et al., LREC 2016)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/L16-1466.pdf
- Data
- FrameNet