Martijn Naaijer
2025
Towards an Integrated Methodology of Dating Biblical Texts: The Case of the Book of Jeremiah
Martijn Naaijer
|
Aren Wilson-Wright
Proceedings of the Second Workshop on Ancient Language Processing
In this paper we describe our research project on dating the language of the Book of Jeremiah using a combination of traditional biblical scholarship and machine learning. Jeremiah is a book with a long history of composing and editing, and the historical background of many of the sections in the book are unclear. Moreover, redaction criticism and historical linguistics are mostly separate fields within the discipline of Biblical Studies. With our approach we want to integrate these areas of research and make new strides in uncovering the compositional history of Book of Jeremiah.
2023
A Transformer-based parser for Syriac morphology
Martijn Naaijer
|
Constantijn Sikkel
|
Mathias Coeckelbergs
|
Jisk Attema
|
Willem Th. Van Peursen
Proceedings of the Ancient Language Processing Workshop
In this project we train a Transformer-based model from scratch, with the goal of parsing the morphology of Ancient Syriac texts as accurately as possible. Syriac is still a low resource language, only a relatively small training set was available. Therefore, the training set was expanded by adding Biblical Hebrew data to it. Five different experiments were done: the model was trained on Syriac data only, it was trained with mixed Syriac and (un)vocalized Hebrew data, and it was pretrained on (un)vocalized Hebrew data and then finetuned on Syriac data. The models trained on Hebrew and Syriac data consistently outperform the models trained on Syriac data only. This shows, that the differences between Syriac and Hebrew are small enough that it is worth adding Hebrew data to train the model for parsing Syriac morphology. Training models on different languages is an important trend in NLP, we show that this works well for relatively small datasets of Syriac and Hebrew.