Kai Golan Hashiloni
Also published as: Kai Golan Hashiloni
2026
Automatic Segmentation of Classical Tibetan Texts into Autochthonous and Allochthonous Regions
Guy Bilitski | Lev Shechter | Sonam Jamtsho | Nir Marciano | Nicola Bajetta | Rebecca Sunden | Omri Drori | Kai Golan Hashiloni | Orr Zwebner | Asaf Shina | Orna Almogi | Dorji Wangchuk | Kfir Bar
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Guy Bilitski | Lev Shechter | Sonam Jamtsho | Nir Marciano | Nicola Bajetta | Rebecca Sunden | Omri Drori | Kai Golan Hashiloni | Orr Zwebner | Asaf Shina | Orna Almogi | Dorji Wangchuk | Kfir Bar
Proceedings of the Fifteenth Language Resources and Evaluation Conference
We introduce a new computational framework for segmenting Classical Tibetan texts into autochthonous and allochthonous regions, distinguishing between indigenous Tibetan compositions and translated materials, primarily from Sanskrit sources. To support this task, we release the first annotated Tibetan corpus for ALLO/AUTO segmentation and evaluate several multilingual encoders, including mBERT and XLM-R, fine-tuned for sequence labeling. Our best model achieves strong alignment with expert annotations, showing that multilingual representations can effectively capture philological boundaries in low-resource settings. This work contributes new resources and methods for computational philology and sheds light on the linguistic markers that trace the intercultural transmission of Buddhist thought in Tibet.
2025
Not Just a Piece of Cake: Cross-Lingual Fine-Tuning for Idiom Identification
Ofri Hefetz | Kai Golan Hashiloni | Alon Mannor | Kfir Bar
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Ofri Hefetz | Kai Golan Hashiloni | Alon Mannor | Kfir Bar
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
We investigate cross-lingual fine-tuning for idiomatic expression identification, addressing the limited availability of annotated data in many languages. We evaluate encoder and generative decoder models to examine their ability to generalize idiom identification across languages. Additionally, we conduct an explainability study using linear probing and LogitLens to analyze how idiomatic meaning is represented across model layers. Results show consistent cross-lingual transfer, with English emerging as a strong source language. All code and models are released to support future research.
DharmaBench: Evaluating Language Models on Buddhist Texts in Sanskrit and Tibetan
Kai Golan Hashiloni | Shay Cohen | Asaf Shina | Jingyi Yang | Orr Meir Zwebner | Nicola Bajetta | Guy Bilitski | Rebecca Sundén | Guy Maduel | Ryan Conlon | Ari Barzilai | Daniel Mass | Shanshan Jia | Aviv Naaman | Sonam Choden | Sonam Jamtsho | Yadi Qu | Harunaga Isaacson | Dorji Wangchuk | Shai Fine | Orna Almogi | Kfir Bar
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Kai Golan Hashiloni | Shay Cohen | Asaf Shina | Jingyi Yang | Orr Meir Zwebner | Nicola Bajetta | Guy Bilitski | Rebecca Sundén | Guy Maduel | Ryan Conlon | Ari Barzilai | Daniel Mass | Shanshan Jia | Aviv Naaman | Sonam Choden | Sonam Jamtsho | Yadi Qu | Harunaga Isaacson | Dorji Wangchuk | Shai Fine | Orna Almogi | Kfir Bar
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
We assess the capabilities of large language models on tasks involving Buddhist texts written in Sanskrit and Classical Tibetan—two typologically distinct, low-resource historical languages. To this end, we introduce DharmaBench, a benchmark suite comprising 13 classification and detection tasks grounded in Buddhist textual traditions: six in Sanskrit and seven in Tibetan, with four shared across both. The tasks are curated from scratch, tailored to the linguistic and cultural characteristics of each language. We evaluate a range of models, from proprietary systems like GPT-4o to smaller, domain-specific open-weight models, analyzing their performance across tasks and languages. All datasets and code are publicly released, under the CC-BY-4 License and the Apache-2.0 License respectively, to support research on historical language processing and the development of culturally inclusive NLP systems.
Easy as PIE? Identifying Multi-Word Expressions with LLMs
Kai Golan Hashiloni | Ofri Hefetz | Kfir Bar
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Kai Golan Hashiloni | Ofri Hefetz | Kfir Bar
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
We investigate the identification of idiomatic expressions—a semantically non-compositional subclass of multiword expressions (MWEs)—in running text using large language models (LLMs) without any fine-tuning. Instead, we adopt a prompt-based approach and evaluate a range of prompting strategies, including zero-shot, few-shot, and chain-of-thought variants, across multiple languages, datasets, and model types. Our experiments show that, with well-crafted prompts, LLMs can perform competitively with supervised models trained on annotated data. These findings highlight the potential of prompt-based LLMs as a flexible and effective alternative for idiomatic expression identification.
Search
Fix author
Co-authors
- Kfir Bar 4
- Orna Almogi 2
- Nicola Bajetta 2
- Guy Bilitski 2
- Ofri Hefetz 2
- Sonam Jamtsho 2
- Asaf Shina 2
- Rebecca Sundén 2
- Dorji Wangchuk 2
- Ari Barzilai 1
- Sonam Choden 1
- Shay B. Cohen 1
- Ryan Conlon 1
- Omri Drori 1
- Shai Fine 1
- Harunaga Isaacson 1
- Shanshan Jia 1
- Guy Maduel 1
- Alon Mannor 1
- Nir Marciano 1
- Daniel Mass 1
- Aviv Naaman 1
- Yadi Qu 1
- Lev Shechter 1
- Jingyi Yang 1
- Orr Zwebner 1
- Orr Meir Zwebner 1