Abstract
In this paper, we investigate how different aspects of discourse context affect the performance of recent neural MT systems. We describe two popular datasets covering news and movie subtitles and we provide a thorough analysis of the distribution of various document-level features in their domains. Furthermore, we train a set of context-aware MT models on both datasets and propose a comparative evaluation scheme that contrasts coherent context with artificially scrambled documents and absent context, arguing that the impact of discourse-aware MT models will become visible in this way. Our results show that the models are indeed affected by the manipulation of the test data, providing a different view on document-level translation quality than absolute sentence-level scores.- Anthology ID:
- D19-6506
- Volume:
- Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019)
- Month:
- November
- Year:
- 2019
- Address:
- Hong Kong, China
- Editors:
- Andrei Popescu-Belis, Sharid Loáiciga, Christian Hardmeier, Deyi Xiong
- Venue:
- DiscoMT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 51–61
- Language:
- URL:
- https://aclanthology.org/D19-6506
- DOI:
- 10.18653/v1/D19-6506
- Cite (ACL):
- Yves Scherrer, Jörg Tiedemann, and Sharid Loáiciga. 2019. Analysing concatenation approaches to document-level NMT in two different domains. In Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019), pages 51–61, Hong Kong, China. Association for Computational Linguistics.
- Cite (Informal):
- Analysing concatenation approaches to document-level NMT in two different domains (Scherrer et al., DiscoMT 2019)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/D19-6506.pdf