Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings

Miguel Moura Ramos, Tomás Almeida, Daniel Vareta, Filipe Azevedo, Sweta Agrawal, Patrick Fernandes, André F. T. Martins


Abstract
Reinforcement learning (RL) has been proven to be an effective and robust method for training neural machine translation systems, especially when paired with powerful reward models that accurately assess translation quality. However, most research has focused on RL methods that use sentence-level feedback, leading to inefficient learning signals due to the reward sparsity problem—the model receives a single score for the entire sentence. To address this, we propose a novel approach that leverages fine-grained, token-level quality assessments along with error severity levels using RL methods. Specifically, we use xCOMET, a state-of-the-art quality estimation system, as our token-level reward model. We conduct experiments on small and large translation datasets with standard encoder-decoder and large language models-based machine translation systems, comparing the impact of sentence-level versus fine-grained reward signals on translation quality. Our results show that training with token-level rewards improves translation quality across language pairs over baselines according to both automatic and human evaluation. Furthermore, token-level reward optimization improves training stability, evidenced by a steady increase in mean rewards over training epochs.
Anthology ID:
2026.tacl-1.33
Volume:
Transactions of the Association for Computational Linguistics, Volume 14
Month:
Year:
2026
Address:
Cambridge, MA
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
733–754
Language:
URL:
https://preview.aclanthology.org/ingest-latest-mitpress-cl-tacl/2026.tacl-1.33/
DOI:
10.1162/tacl.a.646
Bibkey:
Cite (ACL):
Miguel Moura Ramos, Tomás Almeida, Daniel Vareta, Filipe Azevedo, Sweta Agrawal, Patrick Fernandes, and André F. T. Martins. 2026. Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings. Transactions of the Association for Computational Linguistics, 14:733–754.
Cite (Informal):
Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings (Ramos et al., TACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-latest-mitpress-cl-tacl/2026.tacl-1.33.pdf