Shuyuan Cao


pdf bib
How does discourse affect Spanish-Chinese Translation? A case study based on a Spanish-Chinese parallel corpus
Shuyuan Cao
Proceedings of the First Workshop on Computational Approaches to Discourse

With their huge speaking populations in the world, Spanish and Chinese occupy important positions in linguistic studies. Since the two languages come from different language systems, the translation between Spanish and Chinese is complicated. A comparative study for the language pair can discover the discourse differences between Spanish and Chinese, and can benefit the Spanish-Chinese translation. In this work, based on a Spanish-Chinese parallel corpus annotated with discourse information, we compare the annotation results between the language pair and analyze how discourse affects Spanish-Chinese translation. The research results in our study can help human translators who work with the language pair.


Using Discourse Information for Education with a Spanish-Chinese Parallel Corpus
Shuyuan Cao | Harritxu Gete
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

The RST Spanish-Chinese Treebank
Shuyuan Cao | Iria da Cunha | Mikel Iruskieta
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

Discourse analysis is necessary for different tasks of Natural Language Processing (NLP). As two of the most spoken languages in the world, discourse analysis between Spanish and Chinese is important for NLP research. This paper aims to present the first open Spanish-Chinese parallel corpus annotated with discourse information, whose theoretical framework is based on the Rhetorical Structure Theory (RST). We have evaluated and harmonized each annotation part to obtain a high annotated-quality corpus. The corpus is already available to the public.


Discourse Segmentation for Building a RST Chinese Treebank
Shuyuan Cao | Nianwen Xue | Iria da Cunha | Mikel Iruskieta | Chuan Wang
Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms


A Corpus-based Approach for Spanish-Chinese Language Learning
Shuyuan Cao | Iria da Cunha | Mikel Iruskieta
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)

Due to the huge population that speaks Spanish and Chinese, these languages occupy an important position in the language learning studies. Although there are some automatic translation systems that benefit the learning of both languages, there is enough space to create resources in order to help language learners. As a quick and effective resource that can give large amount language information, corpus-based learning is becoming more and more popular. In this paper we enrich a Spanish-Chinese parallel corpus automatically with part of-speech (POS) information and manually with discourse segmentation (following the Rhetorical Structure Theory (RST) (Mann and Thompson, 1988)). Two search tools allow the Spanish-Chinese language learners to carry out different queries based on tokens and lemmas. The parallel corpus and the research tools are available to the academic community. We propose some examples to illustrate how learners can use the corpus to learn Spanish and Chinese.