SSR-Zero: Simple Self-Rewarding Reinforcement Learning for Machine Translation
Wenjie Yang, Mao Zheng, Mingyang Song, Zheng Li, Sitong Wang
Abstract
Large language models (LLMs) have recently demonstrated remarkable capabilities in machine translation (MT). However, most advanced MT-specific LLMs rely heavily on external supervision during training, such as human-annotated reference data or trained reward models (RMs), which are expensive to obtain and difficult to scale. To address this limitation, we propose **Simple Self-Rewarding (SSR)**, a reinforcement learning (RL) framework for MT that is reference-free and relies solely on self-judging rewards. Using only 13K monolingual examples and Qwen-2.5-7B as the backbone, SSR-Zero-7B outperforms existing MT-specific LLMs as well as larger general LLMs such as Qwen2.5-32B-Instruct on English ↔ Chinese translation benchmarks including WMT23, WMT24, and FLORES200. It further demonstrates strong generalization to low-resource language pairs. In addition, when augmented with external supervision from COMET, our strongest model, SSR-X-Zero-7B, surpasses all existing open-source models under 72B parameters and performs competitively with leading closed-source systems in English ↔ Chinese translation. Our analysis highlights the effectiveness and generalizability of the self-rewarding mechanism relative to external LLM-as-a-judge approaches and demonstrates its complementary benefits when combined with trained RMs. We will publicly release our code, data, and models.- Anthology ID:
- 2026.findings-acl.300
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6039–6052
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.300/
- DOI:
- Cite (ACL):
- Wenjie Yang, Mao Zheng, Mingyang Song, Zheng Li, and Sitong Wang. 2026. SSR-Zero: Simple Self-Rewarding Reinforcement Learning for Machine Translation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 6039–6052, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- SSR-Zero: Simple Self-Rewarding Reinforcement Learning for Machine Translation (Yang et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.300.pdf