Abstract
Midrash collections are complex rabbinic works that consist of text in multiple languages, that evolved through long processes of instable oral and written transmission. Determining the origin of a given passage in such a compilation is not always straightforward and is often a matter disputed by scholars, yet it is essential for scholars’ understanding of the passage and its relationship to other texts in the rabbinic corpus. To help solve this problem, we propose a system for classification of rabbinic literature based on its style, leveraging recently released pretrained Transformer models for Hebrew. Additionally, we demonstrate how our method can be applied to uncover lost material from the Midrash Tanhuma.- Anthology ID:
- 2022.nlp4dh-1.6
- Volume:
- Proceedings of the 2nd International Workshop on Natural Language Processing for Digital Humanities
- Month:
- November
- Year:
- 2022
- Address:
- Taipei, Taiwan
- Venue:
- NLP4DH
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 42–46
- Language:
- URL:
- https://aclanthology.org/2022.nlp4dh-1.6
- DOI:
- Cite (ACL):
- Solomon Tannor, Nachum Dershowitz, and Moshe Lavee. 2022. Style Classification of Rabbinic Literature for Detection of Lost Midrash Tanhuma Material. In Proceedings of the 2nd International Workshop on Natural Language Processing for Digital Humanities, pages 42–46, Taipei, Taiwan. Association for Computational Linguistics.
- Cite (Informal):
- Style Classification of Rabbinic Literature for Detection of Lost Midrash Tanhuma Material (Tannor et al., NLP4DH 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.nlp4dh-1.6.pdf