Abstract
Although parallel coreference corpora can to a high degree support the development of SMT systems, there are no large-scale parallel datasets available due to the complexity of the annotation task and the variability in annotation schemes. In this study, we exploit an annotation projection method to combine the output of two coreference resolution systems for two different source languages (English, German) in order to create an annotated corpus for a third language (Russian). We show that our technique is superior to projecting annotations from a single source language, and we provide an in-depth analysis of the projected annotations in order to assess the perspectives of our approach.- Anthology ID:
- W17-4809
- Volume:
- Proceedings of the Third Workshop on Discourse in Machine Translation
- Month:
- September
- Year:
- 2017
- Address:
- Copenhagen, Denmark
- Venue:
- DiscoMT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 67–72
- Language:
- URL:
- https://aclanthology.org/W17-4809
- DOI:
- 10.18653/v1/W17-4809
- Cite (ACL):
- Yulia Grishina. 2017. Combining the output of two coreference resolution systems for two source languages to improve annotation projection. In Proceedings of the Third Workshop on Discourse in Machine Translation, pages 67–72, Copenhagen, Denmark. Association for Computational Linguistics.
- Cite (Informal):
- Combining the output of two coreference resolution systems for two source languages to improve annotation projection (Grishina, DiscoMT 2017)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/W17-4809.pdf