TLUE: A Tibetan Language Understanding Evaluation Benchmark

Fan Gao, Cheng Huang, Yutong Liu, Nyima Tashi, Xiangxiang Wang, Thupten Tsering, Ban Ma-bao, Renzeng Duojie, Gadeng Luosang, Rinchen Dongrub, Dorje Tashi, Xiao Feng Cd, Yongbin Yu, Hao Wang


Abstract
Large language models have made tremendous progress in recent years, but low-resource languages, like Tibetan, remain significantly underrepresented in their evaluation. Despite Tibetan being spoken by over seven million people, it has largely been neglected in the development and assessment of LLMs. To address this gap, we present a Tibetan Language Understanding Evaluation Benchmark, TLUE, which is also the first large-scale benchmark for measuring the proficiency of large language models in the Tibetan language. TLUE comprises two major components: a comprehensive multi-task understanding benchmark spanning 5 domains and 67 subdomains, and a safety benchmark encompassing 7 subdomains. Finally, we evaluate a diverse set of state-of-the-art LLMs. Experimental results demonstrate that most large language models perform below the random baseline, especially highlighting the considerable challenges they face in Tibetan language processing. TLUE provides a crucial foundation for advancing future research in Tibetan language understanding and highlights the importance of promoting greater inclusivity in the development of large language models.
Anthology ID:
2025.emnlp-main.1777
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
35059–35085
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1777/
DOI:
Bibkey:
Cite (ACL):
Fan Gao, Cheng Huang, Yutong Liu, Nyima Tashi, Xiangxiang Wang, Thupten Tsering, Ban Ma-bao, Renzeng Duojie, Gadeng Luosang, Rinchen Dongrub, Dorje Tashi, Xiao Feng Cd, Yongbin Yu, and Hao Wang. 2025. TLUE: A Tibetan Language Understanding Evaluation Benchmark. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 35059–35085, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
TLUE: A Tibetan Language Understanding Evaluation Benchmark (Gao et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1777.pdf
Checklist:
 2025.emnlp-main.1777.checklist.pdf