Analysing Coreference in Transformer Outputs
Ekaterina Lapshinova-Koltunski, Cristina España-Bonet, Josef van Genabith
Abstract
We analyse coreference phenomena in three neural machine translation systems trained with different data settings with or without access to explicit intra- and cross-sentential anaphoric information. We compare system performance on two different genres: news and TED talks. To do this, we manually annotate (the possibly incorrect) coreference chains in the MT outputs and evaluate the coreference chain translations. We define an error typology that aims to go further than pronoun translation adequacy and includes types such as incorrect word selection or missing words. The features of coreference chains in automatic translations are also compared to those of the source texts and human translations. The analysis shows stronger potential translationese effects in machine translated outputs than in human translations.- Anthology ID:
- D19-6501
- Volume:
- Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019)
- Month:
- November
- Year:
- 2019
- Address:
- Hong Kong, China
- Editors:
- Andrei Popescu-Belis, Sharid Loáiciga, Christian Hardmeier, Deyi Xiong
- Venue:
- DiscoMT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1–12
- Language:
- URL:
- https://aclanthology.org/D19-6501
- DOI:
- 10.18653/v1/D19-6501
- Cite (ACL):
- Ekaterina Lapshinova-Koltunski, Cristina España-Bonet, and Josef van Genabith. 2019. Analysing Coreference in Transformer Outputs. In Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019), pages 1–12, Hong Kong, China. Association for Computational Linguistics.
- Cite (Informal):
- Analysing Coreference in Transformer Outputs (Lapshinova-Koltunski et al., DiscoMT 2019)
- PDF:
- https://preview.aclanthology.org/ingest-bitext-workshop/D19-6501.pdf
- Data
- ParCorFull