Can Large Language Models Translate Unseen Languages in Underrepresented Scripts?

Dianqing Lin; Aruukhan; Hongxu Hou; Shuo Sun; Wei Chen; Yichen Yang; Guo Dong Shi

Can Large Language Models Translate Unseen Languages in Underrepresented Scripts?

Dianqing Lin, Aruukhan, Hongxu Hou, Shuo Sun, Wei Chen, Yichen Yang, Guo Dong Shi

Abstract

Large language models (LLMs) have demonstrated impressive performance in machine translation, but still struggle with unseen low-resource languages, especially those written in underrepresented scripts. To investigate whether LLMs can translate such languages with the help of linguistic resources, we introduce Lotus, a benchmark designed to evaluate translation for Mongolian (in traditional script) and Yi. Our study shows that while linguistic resources can improve translation quality as measured by automatic metrics, LLMs remain limited in their ability to handle these languages effectively. We hope our work provides insights for the low-resource NLP community and fosters further progress in machine translation for underrepresented script low-resource languages. Our code and data are available.

Anthology ID:: 2025.emnlp-main.1179
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 23148–23161
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1179/
DOI:
Bibkey:
Cite (ACL):: Dianqing Lin, Aruukhan, Hongxu Hou, Shuo Sun, Wei Chen, Yichen Yang, and Guo Dong Shi. 2025. Can Large Language Models Translate Unseen Languages in Underrepresented Scripts?. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 23148–23161, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Can Large Language Models Translate Unseen Languages in Underrepresented Scripts? (Lin et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1179.pdf
Checklist:: 2025.emnlp-main.1179.checklist.pdf

PDF Cite Search Checklist Fix data