MiLiC-Eval: Benchmarking Multilingual LLMs for China’s Minority Languages

Chen Zhang, Mingxu Tao, Zhiyuan Liao, Yansong Feng


Abstract
Large language models (LLMs) excel in high-resource languages but struggle with low-resource languages (LRLs), particularly those spoken by minority communities in China, such as Tibetan, Uyghur, Kazakh, and Mongolian. To systematically track the progress in these languages, we introduce MiLiC-Eval, a benchmark designed for minority languages in China, featuring 24K instances across 9 tasks. MiLiC-Eval focuses on underrepresented writing systems. Its parallelism between tasks and languages can provide a faithful and fine-grained assessment of linguistic and problem-solving skills. Our evaluation reveals that open-source LLMs perform poorly on syntax-intensive tasks and multi-script languages. We further demonstrate how MiLiC-Eval can help advance LRL research in handling diverse writing systems and understanding the process of language adaptation.
Anthology ID:
2025.findings-acl.578
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11086–11102
Language:
URL:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.578/
DOI:
10.18653/v1/2025.findings-acl.578
Bibkey:
Cite (ACL):
Chen Zhang, Mingxu Tao, Zhiyuan Liao, and Yansong Feng. 2025. MiLiC-Eval: Benchmarking Multilingual LLMs for China’s Minority Languages. In Findings of the Association for Computational Linguistics: ACL 2025, pages 11086–11102, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
MiLiC-Eval: Benchmarking Multilingual LLMs for China’s Minority Languages (Zhang et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.578.pdf