The Use of Parallel and Comparable Data for Analysis of Abstract Anaphora in German and English

Stefanie Dipper, Melanie Seiss, Heike Zinsmeister


Abstract
Parallel corpora ― original texts aligned with their translations ― are a widely used resource in computational linguistics. Translation studies have shown that translated texts often differ systematically from comparable original texts. Translators tend to be faithful to structures of the original texts, resulting in a """"shining through"""" of the original language preferences in the translated text. Translators also tend to make their translations most comprehensible with the effect that translated texts can be more explicit than their source texts. Motivated by the need to use a parallel resource for cross-linguistic feature induction in abstract anaphora resolution, this paper investigates properties of English and German texts in the Europarl corpus, taking into account both general features such as sentence length as well as task-dependent features such as the distribution of demonstrative noun phrases. The investigation is based on the entire Europarl corpus as well as on a small subset thereof, which has been manually annotated. The results indicate English translated texts are sufficiently """"authentic"""" to be used as training data for anaphora resolution; results for German texts are less conclusive, though.
Anthology ID:
L12-1038
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
138–145
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/172_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Stefanie Dipper, Melanie Seiss, and Heike Zinsmeister. 2012. The Use of Parallel and Comparable Data for Analysis of Abstract Anaphora in German and English. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 138–145, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
The Use of Parallel and Comparable Data for Analysis of Abstract Anaphora in German and English (Dipper et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/172_Paper.pdf