Yongbin Yu


2025

pdf bib
TLUE: A Tibetan Language Understanding Evaluation Benchmark
Fan Gao | Cheng Huang | Yutong Liu | Nyima Tashi | Xiangxiang Wang | Thupten Tsering | Ban Ma-bao | Renzeng Duojie | Gadeng Luosang | Rinchen Dongrub | Dorje Tashi | Xiao Feng Cd | Yongbin Yu | Hao Wang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large language models have made tremendous progress in recent years, but low-resource languages, like Tibetan, remain significantly underrepresented in their evaluation. Despite Tibetan being spoken by over seven million people, it has largely been neglected in the development and assessment of LLMs. To address this gap, we present a Tibetan Language Understanding Evaluation Benchmark, TLUE, which is also the first large-scale benchmark for measuring the proficiency of large language models in the Tibetan language. TLUE comprises two major components: a comprehensive multi-task understanding benchmark spanning 5 domains and 67 subdomains, and a safety benchmark encompassing 7 subdomains. Finally, we evaluate a diverse set of state-of-the-art LLMs. Experimental results demonstrate that most large language models perform below the random baseline, especially highlighting the considerable challenges they face in Tibetan language processing. TLUE provides a crucial foundation for advancing future research in Tibetan language understanding and highlights the importance of promoting greater inclusivity in the development of large language models.