Combining the output of two coreference resolution systems for two source languages to improve annotation projection

Yulia Grishina


Abstract
Although parallel coreference corpora can to a high degree support the development of SMT systems, there are no large-scale parallel datasets available due to the complexity of the annotation task and the variability in annotation schemes. In this study, we exploit an annotation projection method to combine the output of two coreference resolution systems for two different source languages (English, German) in order to create an annotated corpus for a third language (Russian). We show that our technique is superior to projecting annotations from a single source language, and we provide an in-depth analysis of the projected annotations in order to assess the perspectives of our approach.
Anthology ID:
W17-4809
Volume:
Proceedings of the Third Workshop on Discourse in Machine Translation
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Venue:
DiscoMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
67–72
Language:
URL:
https://aclanthology.org/W17-4809
DOI:
10.18653/v1/W17-4809
Bibkey:
Cite (ACL):
Yulia Grishina. 2017. Combining the output of two coreference resolution systems for two source languages to improve annotation projection. In Proceedings of the Third Workshop on Discourse in Machine Translation, pages 67–72, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Combining the output of two coreference resolution systems for two source languages to improve annotation projection (Grishina, DiscoMT 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/W17-4809.pdf