Enhancing Machine Translation with Self-Supervised Preference Data

Haoxiang Sun, Ruize Gao, Pei Zhang, Baosong Yang, Rui Wang


Abstract
Model alignment methods like Direct Preference Optimization and Contrastive Preference Optimization have enhanced machine translation performance by leveraging preference data to enable models to reject suboptimal outputs. During preference data construction, previous approaches primarily rely on humans, strong models like GPT4 or model self-sampling. In this study, we first explain the shortcomings of this practice. Then, we propose Self-Supervised Preference Optimization (SSPO), a novel framework which efficiently constructs translation preference data for iterative DPO training. Applying SSPO to 14B parameters large language models (LLMs) achieves comparable or better performance than GPT-4o on FLORES and multi-domain test datasets. We release an augmented MQM dataset in https://github.com/sunny-sjtu/MQM-aug.
Anthology ID:
2025.acl-long.1165
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
23916–23934
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1165/
DOI:
Bibkey:
Cite (ACL):
Haoxiang Sun, Ruize Gao, Pei Zhang, Baosong Yang, and Rui Wang. 2025. Enhancing Machine Translation with Self-Supervised Preference Data. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 23916–23934, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Enhancing Machine Translation with Self-Supervised Preference Data (Sun et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1165.pdf