Remedy-R: Generative Reasoning for Machine Translation Evaluation without Error Annotations
Shaomu Tan, Ryosuke Mitani, Ritvik Choudhary, Qiyu Wu, Toshiyuki Sekiya, Christof Monz
Abstract
Over the years, scalar MT metrics have advanced rapidly on benchmarks. Yet they remain black boxes, offering little insight into their decisions and sometimes degrading under out-of-distribution inputs. We introduce Remedy-R, a reasoning-driven generative MT metric trained with reinforcement learning from pairwise translation preferences, without requiring error-span annotations or distillation from closed LLMs. Unlike scalar MT metrics that only outputs translation quality scores, Remedy-R produces step-by-step analyses of accuracy, fluency, and completeness, enabling more interpretable assessments. With only 60K pairwise training samples across two language pairs, Remedy-R remains competitive with top scalar metrics and GPT-4-based judges on WMT22–24 metric benchmarks, generalizes to other languages, and shows strong robustness on OOD stress tests. Moreover, Remedy-R generates self-reflective feedback that can be reused for translation refinement. We validate the faithfulness of such feedback with GPT-4 and show that a simple evaluate–revise pipeline leveraging Remedy-R’s analyses consistently improves translation quality across diverse models without any task-specific tuning.- Anthology ID:
- 2026.findings-acl.364
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 7374–7398
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.364/
- DOI:
- Cite (ACL):
- Shaomu Tan, Ryosuke Mitani, Ritvik Choudhary, Qiyu Wu, Toshiyuki Sekiya, and Christof Monz. 2026. Remedy-R: Generative Reasoning for Machine Translation Evaluation without Error Annotations. In Findings of the Association for Computational Linguistics: ACL 2026, pages 7374–7398, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Remedy-R: Generative Reasoning for Machine Translation Evaluation without Error Annotations (Tan et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.364.pdf