Abstract
Discourse analysis is necessary for different tasks of Natural Language Processing (NLP). As two of the most spoken languages in the world, discourse analysis between Spanish and Chinese is important for NLP research. This paper aims to present the first open Spanish-Chinese parallel corpus annotated with discourse information, whose theoretical framework is based on the Rhetorical Structure Theory (RST). We have evaluated and harmonized each annotation part to obtain a high annotated-quality corpus. The corpus is already available to the public.- Anthology ID:
- W18-4917
- Volume:
- Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
- Month:
- August
- Year:
- 2018
- Address:
- Santa Fe, New Mexico, USA
- Editors:
- Agata Savary, Carlos Ramisch, Jena D. Hwang, Nathan Schneider, Melanie Andresen, Sameer Pradhan, Miriam R. L. Petruck
- Venues:
- LAW | MWE
- SIGs:
- SIGLEX | SIGANN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 156–166
- Language:
- URL:
- https://aclanthology.org/W18-4917
- DOI:
- Cite (ACL):
- Shuyuan Cao, Iria da Cunha, and Mikel Iruskieta. 2018. The RST Spanish-Chinese Treebank. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pages 156–166, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
- Cite (Informal):
- The RST Spanish-Chinese Treebank (Cao et al., LAW-MWE 2018)
- PDF:
- https://preview.aclanthology.org/landing_page/W18-4917.pdf