Mitigating Translationese with GPT-4: Strategies and Performance
Maria Kunilovskaya, Koel Dutta Chowdhury, Heike Przybyl, Cristina España-Bonet, Josef Genabith
Abstract
Translations differ in systematic ways from texts originally authored in the same language.These differences, collectively known as translationese, can pose challenges in cross-lingual natural language processing: models trained or tested on translated input might struggle when presented with non-translated language. Translationese mitigation can alleviate this problem. This study investigates the generative capacities of GPT-4 to reduce translationese in human-translated texts. The task is framed as a rewriting process aimed at modified translations indistinguishable from the original text in the target language. Our focus is on prompt engineering that tests the utility of linguistic knowledge as part of the instruction for GPT-4. Through a series of prompt design experiments, we show that GPT4-generated revisions are more similar to originals in the target language when the prompts incorporate specific linguistic instructions instead of relying solely on the model’s internal knowledge. Furthermore, we release the segment-aligned bidirectional German-English data built from the Europarl corpus that underpins this study.- Anthology ID:
- 2024.eamt-1.35
- Volume:
- Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)
- Month:
- June
- Year:
- 2024
- Address:
- Sheffield, UK
- Editors:
- Carolina Scarton, Charlotte Prescott, Chris Bayliss, Chris Oakley, Joanna Wright, Stuart Wrigley, Xingyi Song, Edward Gow-Smith, Rachel Bawden, Víctor M Sánchez-Cartagena, Patrick Cadwell, Ekaterina Lapshinova-Koltunski, Vera Cabarrão, Konstantinos Chatzitheodorou, Mary Nurminen, Diptesh Kanojia, Helena Moniz
- Venue:
- EAMT
- SIG:
- Publisher:
- European Association for Machine Translation (EAMT)
- Note:
- Pages:
- 411–430
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2024.eamt-1.35/
- DOI:
- Cite (ACL):
- Maria Kunilovskaya, Koel Dutta Chowdhury, Heike Przybyl, Cristina España-Bonet, and Josef Genabith. 2024. Mitigating Translationese with GPT-4: Strategies and Performance. In Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1), pages 411–430, Sheffield, UK. European Association for Machine Translation (EAMT).
- Cite (Informal):
- Mitigating Translationese with GPT-4: Strategies and Performance (Kunilovskaya et al., EAMT 2024)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2024.eamt-1.35.pdf