Prompting Large Language Models with Human Error Markings for Self-Correcting Machine Translation

Nathaniel Berger; Stefan Riezler; Miriam Exel; Matthias Huck

Prompting Large Language Models with Human Error Markings for Self-Correcting Machine Translation

Nathaniel Berger, Stefan Riezler, Miriam Exel, Matthias Huck

Abstract

While large language models (LLMs) pre-trained on massive amounts of unpaired language data have reached the state-of-the-art in machine translation (MT) of general domain texts, post-editing (PE) is still required to correct errors and to enhance term translation quality in specialized domains. In this paper we present a pilot study of enhancing translation memories (TM) produced by PE (source segments, machine translations, and reference translations, henceforth called PE-TM) for the needs of correct and consistent term translation in technical domains. We investigate a light-weight two-step scenario where at inference time, a human translator marks errors in the first translation step, and in a second step a few similar examples are extracted from the PE-TM to prompt an LLM. Our experiment shows that the additional effort of augmenting translations with human error markings guides the LLM to focus on a correction of the marked errors, yielding consistent improvements over automatic PE (APE) and MT from scratch.

Anthology ID:: 2024.eamt-1.54
Volume:: Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)
Month:: June
Year:: 2024
Address:: Sheffield, UK
Editors:: Carolina Scarton, Charlotte Prescott, Chris Bayliss, Chris Oakley, Joanna Wright, Stuart Wrigley, Xingyi Song, Edward Gow-Smith, Rachel Bawden, Víctor M Sánchez-Cartagena, Patrick Cadwell, Ekaterina Lapshinova-Koltunski, Vera Cabarrão, Konstantinos Chatzitheodorou, Mary Nurminen, Diptesh Kanojia, Helena Moniz
Venue:: EAMT
SIG:
Publisher:: European Association for Machine Translation (EAMT)
Note:
Pages:: 636–646
Language:
URL:: https://preview.aclanthology.org/build-pipeline-with-new-library/2024.eamt-1.54/
DOI:
Bibkey:
Cite (ACL):: Nathaniel Berger, Stefan Riezler, Miriam Exel, and Matthias Huck. 2024. Prompting Large Language Models with Human Error Markings for Self-Correcting Machine Translation. In Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1), pages 636–646, Sheffield, UK. European Association for Machine Translation (EAMT).
Cite (Informal):: Prompting Large Language Models with Human Error Markings for Self-Correcting Machine Translation (Berger et al., EAMT 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/build-pipeline-with-new-library/2024.eamt-1.54.pdf

PDF Search Fix metadata