Evaluating Language Translation Models by Playing Telephone

Syeda Jannatus Saba, Steven Skiena


Abstract
Our ability to efficiently and accurately evaluate the quality of machine translation systems has been outrun by the effectiveness of current language models—which limits the potential for further improving these models on more challenging tasks like long-form and literary translation. We propose an unsupervised method to generate training data for translation evaluation over different document lengths and application domains by repeated rounds of translation between source and target languages. We evaluate evaluation systems trained on texts mechanically generated using both model rotation and language translation approaches, demonstrating improved performance over a popular translation evaluation system (xCOMET) on two different tasks: (i) scoring the quality of a given translation against a human reference and (ii) selecting which of two translations is generationally closer to an original source document.
Anthology ID:
2025.emnlp-main.524
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10332–10347
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.524/
DOI:
Bibkey:
Cite (ACL):
Syeda Jannatus Saba and Steven Skiena. 2025. Evaluating Language Translation Models by Playing Telephone. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 10332–10347, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Evaluating Language Translation Models by Playing Telephone (Saba & Skiena, EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.524.pdf
Checklist:
 2025.emnlp-main.524.checklist.pdf