Dorji Wangchuk


2025

pdf bib
DharmaBench: Evaluating Language Models on Buddhist Texts in Sanskrit and Tibetan
Kai Golan Hashiloni | Shay Cohen | Asaf Shina | Jingyi Yang | Orr Meir Zwebner | Nicola Bajetta | Guy Bilitski | Rebecca Sundén | Guy Maduel | Ryan Conlon | Ari Barzilai | Daniel Mass | Shanshan Jia | Aviv Naaman | Sonam Choden | Sonam Jamtsho | Yadi Qu | Harunaga Isaacson | Dorji Wangchuk | Shai Fine | Orna Almogi | Kfir Bar
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

We assess the capabilities of large language models on tasks involving Buddhist texts written in Sanskrit and Classical Tibetan—two typologically distinct, low-resource historical languages. To this end, we introduce DharmaBench, a benchmark suite comprising 13 classification and detection tasks grounded in Buddhist textual traditions: six in Sanskrit and seven in Tibetan, with four shared across both. The tasks are curated from scratch, tailored to the linguistic and cultural characteristics of each language. We evaluate a range of models, from proprietary systems like GPT-4o to smaller, domain-specific open-weight models, analyzing their performance across tasks and languages. All datasets and code are publicly released, under the CC-BY-4 License and the Apache-2.0 License respectively, to support research on historical language processing and the development of culturally inclusive NLP systems.