Nyima Tashi
2025
TLUE: A Tibetan Language Understanding Evaluation Benchmark
Fan Gao
|
Cheng Huang
|
Yutong Liu
|
Nyima Tashi
|
Xiangxiang Wang
|
Thupten Tsering
|
Ban Ma-bao
|
Renzeng Duojie
|
Gadeng Luosang
|
Rinchen Dongrub
|
Dorje Tashi
|
Xiao Feng Cd
|
Yongbin Yu
|
Hao Wang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models have made tremendous progress in recent years, but low-resource languages, like Tibetan, remain significantly underrepresented in their evaluation. Despite Tibetan being spoken by over seven million people, it has largely been neglected in the development and assessment of LLMs. To address this gap, we present a Tibetan Language Understanding Evaluation Benchmark, TLUE, which is also the first large-scale benchmark for measuring the proficiency of large language models in the Tibetan language. TLUE comprises two major components: a comprehensive multi-task understanding benchmark spanning 5 domains and 67 subdomains, and a safety benchmark encompassing 7 subdomains. Finally, we evaluate a diverse set of state-of-the-art LLMs. Experimental results demonstrate that most large language models perform below the random baseline, especially highlighting the considerable challenges they face in Tibetan language processing. TLUE provides a crucial foundation for advancing future research in Tibetan language understanding and highlights the importance of promoting greater inclusivity in the development of large language models.
Human-in-the-Loop Generation of Adversarial Texts: A Case Study on Tibetan Script
Xi Cao
|
Yuan Sun
|
Jiajun Li
|
Quzong Gesang
|
Nuo Qun
|
Nyima Tashi
Proceedings of The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: System Demonstrations
DNN-based language models excel across various NLP tasks but remain highly vulnerable to textual adversarial attacks. While adversarial text generation is crucial for NLP security, explainability, evaluation, and data augmentation, related work remains overwhelmingly English-centric, leaving the problem of constructing high-quality and sustainable adversarial robustness benchmarks for lower-resourced languages both difficult and understudied. First, method customization for lower-resourced languages is complicated due to linguistic differences and limited resources. Second, automated attacks are prone to generating invalid or ambiguous adversarial texts. Last but not least, language models continuously evolve and may be immune to parts of previously generated adversarial texts. To address these challenges, we introduce HITL-GAT, an interactive system based on a general approach to human-in-the-loop generation of adversarial texts. Additionally, we demonstrate the utility of HITL-GAT through a case study on Tibetan script, employing three customized adversarial text generation methods and establishing its first adversarial robustness benchmark, providing a valuable reference for other lower-resourced languages.
Search
Fix author
Co-authors
- Xi Cao 1
- Xiao Feng Cd 1
- Rinchen Dongrub 1
- Renzeng Duojie 1
- Fan Gao 1
- show all...