Abstract
Context-aware neural machine translation, a paradigm that involves leveraging information beyond sentence-level context to resolve inter-sentential discourse dependencies and improve document-level translation quality, has given rise to a number of recent techniques. However, despite well-reasoned intuitions, most context-aware translation models show only modest improvements over sentence-level systems. In this work, we investigate and present several core challenges that impede progress within the field, relating to discourse phenomena, context usage, model architectures, and document-level evaluation. To address these problems, we propose a more realistic setting for document-level translation, called paragraph-to-paragraph (PARA2PARA) translation, and collect a new dataset of Chinese-English novels to promote future research.- Anthology ID:
- 2023.emnlp-main.943
- Volume:
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 15246–15263
- Language:
- URL:
- https://aclanthology.org/2023.emnlp-main.943
- DOI:
- 10.18653/v1/2023.emnlp-main.943
- Cite (ACL):
- Linghao Jin, Jacqueline He, Jonathan May, and Xuezhe Ma. 2023. Challenges in Context-Aware Neural Machine Translation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15246–15263, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Challenges in Context-Aware Neural Machine Translation (Jin et al., EMNLP 2023)
- PDF:
- https://preview.aclanthology.org/landing_page/2023.emnlp-main.943.pdf