UvA-MT’s Participation in the WMT24 General Translation Shared Task

Shaomu Tan; David Stap; Seth Aycock; Christof Monz; Di Wu

doi:10.18653/v1/2024.wmt-1.11

UvA-MT’s Participation in the WMT24 General Translation Shared Task

Shaomu Tan, David Stap, Seth Aycock, Christof Monz, Di Wu

Abstract

Fine-tuning Large Language Models (FT-LLMs) with parallel data has emerged as a promising paradigm in recent machine translation research. In this paper, we explore the effectiveness of FT-LLMs and compare them to traditional encoder-decoder Neural Machine Translation (NMT) systems under the WMT24 general MT shared task for English to Chinese direction. We implement several techniques, including Quality Estimation (QE) data filtering, supervised fine-tuning, and post-editing that integrate NMT systems with LLMs. We demonstrate that fine-tuning LLaMA2 on a high-quality but relatively small bitext dataset (100K) yields COMET results comparable to much smaller encoder-decoder NMT systems trained on over 22 million bitexts. However, this approach largely underperforms on surface-level metrics like BLEU and ChrF. We further control the data quality using the COMET-based quality estimation method. Our experiments show that 1) filtering low COMET scores largely improves encoder-decoder systems, but 2) no clear gains are observed for LLMs when further refining the fine-tuning set. Finally, we show that combining NMT systems with LLMs via post-editing generally yields the best performance for the WMT24 official test set.

Anthology ID:: 2024.wmt-1.11
Volume:: Proceedings of the Ninth Conference on Machine Translation
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:: WMT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 176–184
Language:
URL:: https://aclanthology.org/2024.wmt-1.11
DOI:: 10.18653/v1/2024.wmt-1.11
Bibkey:
Cite (ACL):: Shaomu Tan, David Stap, Seth Aycock, Christof Monz, and Di Wu. 2024. UvA-MT’s Participation in the WMT24 General Translation Shared Task. In Proceedings of the Ninth Conference on Machine Translation, pages 176–184, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: UvA-MT’s Participation in the WMT24 General Translation Shared Task (Tan et al., WMT 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/dois-2013-emnlp/2024.wmt-1.11.pdf

PDF Search