Remedy-R: Generative Reasoning for Machine Translation Evaluation without Error Annotations

Shaomu Tan, Ryosuke Mitani, Ritvik Choudhary, Qiyu Wu, Toshiyuki Sekiya, Christof Monz


Abstract
Over the years, scalar MT metrics have advanced rapidly on benchmarks. Yet they remain black boxes, offering little insight into their decisions and sometimes degrading under out-of-distribution inputs. We introduce Remedy-R, a reasoning-driven generative MT metric trained with reinforcement learning from pairwise translation preferences, without requiring error-span annotations or distillation from closed LLMs. Unlike scalar MT metrics that only outputs translation quality scores, Remedy-R produces step-by-step analyses of accuracy, fluency, and completeness, enabling more interpretable assessments. With only 60K pairwise training samples across two language pairs, Remedy-R remains competitive with top scalar metrics and GPT-4-based judges on WMT22–24 metric benchmarks, generalizes to other languages, and shows strong robustness on OOD stress tests. Moreover, Remedy-R generates self-reflective feedback that can be reused for translation refinement. We validate the faithfulness of such feedback with GPT-4 and show that a simple evaluate–revise pipeline leveraging Remedy-R’s analyses consistently improves translation quality across diverse models without any task-specific tuning.
Anthology ID:
2026.findings-acl.364
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7374–7398
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.364/
DOI:
Bibkey:
Cite (ACL):
Shaomu Tan, Ryosuke Mitani, Ritvik Choudhary, Qiyu Wu, Toshiyuki Sekiya, and Christof Monz. 2026. Remedy-R: Generative Reasoning for Machine Translation Evaluation without Error Annotations. In Findings of the Association for Computational Linguistics: ACL 2026, pages 7374–7398, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Remedy-R: Generative Reasoning for Machine Translation Evaluation without Error Annotations (Tan et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.364.pdf
Checklist:
 2026.findings-acl.364.checklist.pdf