ReMedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling

Shaomu Tan; Christof Monz

ReMedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling

Abstract

A key challenge in MT evaluation is the inherent noise and inconsistency of human ratings. Regression-based neural metrics struggle with this noise, while prompting LLMs shows promise at system-level evaluation but performs poorly at segment level. In this work, we propose ReMedy, a novel MT metric framework that reformulates translation evaluation as a reward modeling task. Instead of regressing on imperfect human ratings directly, ReMedy learns relative translation quality using pairwise preference data, resulting in a more reliable evaluation. In extensive experiments across WMT22-24 shared tasks (39 language pairs, 111 MT systems), ReMedy achieves state-of-the-art performance at both segment- and system-level evaluation. Specifically, ReMedy-9B surpasses larger WMT winners and massive closed LLMs such as MetricX-13B, XCOMET-Ensemble, GEMBA-GPT-4, PaLM-540B, and finetuned PaLM2. Further analyses demonstrate that ReMedy delivers superior capability in detecting translation errors and evaluating low-quality translations.

Anthology ID:: 2025.emnlp-main.217
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4370–4387
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.217/
DOI:
Bibkey:
Cite (ACL):: Shaomu Tan and Christof Monz. 2025. ReMedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 4370–4387, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: ReMedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling (Tan & Monz, EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.217.pdf
Checklist:: 2025.emnlp-main.217.checklist.pdf

PDF Cite Search Checklist Fix data