Moshe Lavee
2025
Systematic Textual Availability of Manuscripts
Hadar Miller
|
Samuel Londner
|
Tsvi Kuflik
|
Daria Vasyutinsky Shapira
|
Nachum Dershowitz
|
Moshe Lavee
Proceedings of the 5th Conference on Language, Data and Knowledge
33 The digital era has made millions of manuscript images in Hebrew available to all. However, despite major advancements in handwritten text recognition over the past decade, an efficient pipeline for large scale and accurate conversion of these manuscripts into useful machine-readable form is still sorely lacking.We propose a pipeline that significantly improves recognition models for automatic transcription of Hebrew manuscripts. Transfer learning is used to fine-tune pretrained models. For post-recognition correction, it leverages text reuse, a common phenomenon in medieval manuscripts, and state-of-the-art large language models for medieval Hebrew.The framework successfully handles noisy transcriptions and consistently suggests alternate, better readings. Initial results show that word level accuracy increased by 10% for new readings proposed by text-reuse detection. Moreover, the character level accuracy improved by 18% by fine-tuning models on the first few pages of each manuscript.
2022
Style Classification of Rabbinic Literature for Detection of Lost Midrash Tanhuma Material
Solomon Tannor
|
Nachum Dershowitz
|
Moshe Lavee
Proceedings of the 2nd International Workshop on Natural Language Processing for Digital Humanities
Midrash collections are complex rabbinic works that consist of text in multiple languages, that evolved through long processes of instable oral and written transmission. Determining the origin of a given passage in such a compilation is not always straightforward and is often a matter disputed by scholars, yet it is essential for scholars’ understanding of the passage and its relationship to other texts in the rabbinic corpus. To help solve this problem, we propose a system for classification of rabbinic literature based on its style, leveraging recently released pretrained Transformer models for Hebrew. Additionally, we demonstrate how our method can be applied to uncover lost material from the Midrash Tanhuma.
Search
Fix author
Co-authors
- Nachum Dershowitz 2
- Tsvi Kuflik 1
- Samuel Londner 1
- Hadar Miller 1
- Daria Vasyutinsky Shapira 1
- show all...