Orr Zwebner
2026
Automatic Segmentation of Classical Tibetan Texts into Autochthonous and Allochthonous Regions
Guy Bilitski | Lev Shechter | Sonam Jamtsho | Nir Marciano | Nicola Bajetta | Rebecca Sunden | Omri Drori | Kai Golan Hashiloni | Orr Zwebner | Asaf Shina | Orna Almogi | Dorji Wangchuk | Kfir Bar
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Guy Bilitski | Lev Shechter | Sonam Jamtsho | Nir Marciano | Nicola Bajetta | Rebecca Sunden | Omri Drori | Kai Golan Hashiloni | Orr Zwebner | Asaf Shina | Orna Almogi | Dorji Wangchuk | Kfir Bar
Proceedings of the Fifteenth Language Resources and Evaluation Conference
We introduce a new computational framework for segmenting Classical Tibetan texts into autochthonous and allochthonous regions, distinguishing between indigenous Tibetan compositions and translated materials, primarily from Sanskrit sources. To support this task, we release the first annotated Tibetan corpus for ALLO/AUTO segmentation and evaluate several multilingual encoders, including mBERT and XLM-R, fine-tuned for sequence labeling. Our best model achieves strong alignment with expert annotations, showing that multilingual representations can effectively capture philological boundaries in low-resource settings. This work contributes new resources and methods for computational philology and sheds light on the linguistic markers that trace the intercultural transmission of Buddhist thought in Tibet.