Abstract
Standard context-aware neural machine translation (NMT) typically relies on parallel document-level data, exploiting both source and target contexts. Concatenation-based approaches in particular, still a strong baseline for document-level NMT, prepend source and/or target context sentences to the sentences to be translated, with model variants that exploit equal amounts of source and target data on each side achieving state-of-the-art results. In this work, we investigate whether target data should be further promoted within standard concatenation-based approaches, as most document-level phenomena rely on information that is present on the target language side. We evaluate novel concatenation-based variants where the target context is prepended to the source language, either in isolation or in combination with the source context. Experimental results in English-Russian and Basque-Spanish show that including target context in the source leads to large improvements on target language phenomena. On source-dependent phenomena, using only target language context in the source achieves parity with state-of-the-art concatenation approaches, or slightly underperforms, whereas combining source and target context on the source side leads to significant gains across the board.- Anthology ID:
- 2024.eamt-1.6
- Volume:
- Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)
- Month:
- June
- Year:
- 2024
- Address:
- Sheffield, UK
- Editors:
- Carolina Scarton, Charlotte Prescott, Chris Bayliss, Chris Oakley, Joanna Wright, Stuart Wrigley, Xingyi Song, Edward Gow-Smith, Rachel Bawden, Víctor M Sánchez-Cartagena, Patrick Cadwell, Ekaterina Lapshinova-Koltunski, Vera Cabarrão, Konstantinos Chatzitheodorou, Mary Nurminen, Diptesh Kanojia, Helena Moniz
- Venue:
- EAMT
- SIG:
- Publisher:
- European Association for Machine Translation (EAMT)
- Note:
- Pages:
- 9–23
- Language:
- URL:
- https://aclanthology.org/2024.eamt-1.6
- DOI:
- Cite (ACL):
- Harritxu Gete and Thierry Etchegoyhen. 2024. Promoting Target Data in Context-aware Neural Machine Translation. In Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1), pages 9–23, Sheffield, UK. European Association for Machine Translation (EAMT).
- Cite (Informal):
- Promoting Target Data in Context-aware Neural Machine Translation (Gete & Etchegoyhen, EAMT 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.eamt-1.6.pdf