2025
pdf
bib
abs
WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects
Daniel Deutsch
|
Eleftheria Briakou
|
Isaac Rayburn Caswell
|
Mara Finkelstein
|
Rebecca Galor
|
Juraj Juraska
|
Geza Kovacs
|
Alison Lui
|
Ricardo Rei
|
Jason Riesa
|
Shruti Rijhwani
|
Parker Riley
|
Elizabeth Salesky
|
Firas Trabelsi
|
Stephanie Winkler
|
Biao Zhang
|
Markus Freitag
Findings of the Association for Computational Linguistics: ACL 2025
As large language models (LLM) become more and more capable in languages other than English, it is important to collect benchmark datasets in order to evaluate their multilingual performance, including on tasks like machine translation (MT). In this work, we extend the WMT24 dataset to cover 55 languages by collecting new human-written references and post-edits for 46 new languages/dialects in addition to post-edits of the references in 8 out of 9 languages in the original WMT24 dataset. We benchmark a variety of MT providers and LLMs on the collected dataset using automatic metrics and find that LLMs are the best-performing MT systems in all 55 languages. However, we caution against using our results to reach strong conclusions about MT quality without a human-based evaluation due to limitations of automatic evaluation metrics, which we leave for future work.
2024
pdf
bib
abs
Beyond Human-Only: Evaluating Human-Machine Collaboration for Collecting High-Quality Translation Data
Zhongtao Liu
|
Parker Riley
|
Daniel Deutsch
|
Alison Lui
|
Mengmeng Niu
|
Apurva Shah
|
Markus Freitag
Proceedings of the Ninth Conference on Machine Translation
Collecting high-quality translations is crucial for the development and evaluation of machine translation systems. However, traditional human-only approaches are costly and slow. This study presents a comprehensive investigation of 11 approaches for acquiring translation data, including human-only, machine-only, and hybrid approaches. Our findings demonstrate that human-machine collaboration can match or even exceed the quality of human-only translations, while being more cost-efficient. Error analysis reveals the complementary strengths between human and machine contributions, highlighting the effectiveness of collaborative methods. Cost analysis further demonstrates the economic benefits of human-machine collaboration methods, with some approaches achieving top-tier quality at around 60% of the cost of traditional methods. We release a publicly available dataset containing nearly 18,000 segments of varying translation quality with corresponding human ratings to facilitate future research.
2019
pdf
bib
abs
Neural Machine Translation of Text from Non-Native Speakers
Antonios Anastasopoulos
|
Alison Lui
|
Toan Q. Nguyen
|
David Chiang
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Neural Machine Translation (NMT) systems are known to degrade when confronted with noisy data, especially when the system is trained only on clean data. In this paper, we show that augmenting training data with sentences containing artificially-introduced grammatical errors can make the system more robust to such errors. In combination with an automatic grammar error correction system, we can recover 1.0 BLEU out of 2.4 BLEU lost due to grammatical errors. We also present a set of Spanish translations of the JFLEG grammar error correction corpus, which allows for testing NMT robustness to real grammatical errors.