Integrating Semantic and Statistical Features for Authorial Clustering of Qumran Scrolls

Yonatan Lourie, Jonathan Ben-Dov, Roded Sharan


Abstract
We present a novel framework for authorial classification and clustering of the Qumran Dead Sea Scrolls (DSS). Our approach com-bines modern Hebrew BERT embeddings with traditional natural language processing features in a graph neural network (GNN) architecture. Our results outperform baseline methods on both the Dead Sea Scrolls and a validation dataset of the Hebrew Bible. In particular, we leverage our model to provide significant insights into long-standing debates, including the classification of sectarian and non-sectarian texts and the division of the Hodayot collection of hymns.
Anthology ID:
2025.alp-1.2
Volume:
Proceedings of the Second Workshop on Ancient Language Processing
Month:
May
Year:
2025
Address:
The Albuquerque Convention Center, Laguna
Editors:
Adam Anderson, Shai Gordin, Bin Li, Yudong Liu, Marco C. Passarotti, Rachele Sprugnoli
Venues:
ALP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12–21
Language:
URL:
https://preview.aclanthology.org/corrections-2025-06/2025.alp-1.2/
DOI:
10.18653/v1/2025.alp-1.2
Bibkey:
Cite (ACL):
Yonatan Lourie, Jonathan Ben-Dov, and Roded Sharan. 2025. Integrating Semantic and Statistical Features for Authorial Clustering of Qumran Scrolls. In Proceedings of the Second Workshop on Ancient Language Processing, pages 12–21, The Albuquerque Convention Center, Laguna. Association for Computational Linguistics.
Cite (Informal):
Integrating Semantic and Statistical Features for Authorial Clustering of Qumran Scrolls (Lourie et al., ALP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-06/2025.alp-1.2.pdf