Evaluating Existing Lemmatisers on Unedited Byzantine Greek Poetry

Colin Swaelens, Ilse De Vos, Els Lefever


Abstract
This paper reports on the results of a comparative evaluation in view of the development of a new lemmatizer for unedited, Byzantine Greek texts. For the experiment, the performance of four existing lemmatizers, all pre-trained on Ancient Greek texts, was evaluated on how well they could handle texts stemming from the Middle Ages and displaying quite some peculiarities. The aim of this study is to get insights into the pitfalls of existing lemmatistion approaches as well as the specific challenges of our Byzantine Greek corpus, in order to develop a lemmatizer that can cope with its peculiarities. The results of the experiment show an accuracy drop of 20pp. on our corpus, which is further investigated in a qualitative error analysis.
Anthology ID:
2023.alp-1.13
Volume:
Proceedings of the Ancient Language Processing Workshop
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Adam Anderson, Shai Gordin, Bin Li, Yudong Liu, Marco C. Passarotti
Venues:
ALP | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
111–116
Language:
URL:
https://aclanthology.org/2023.alp-1.13
DOI:
Bibkey:
Cite (ACL):
Colin Swaelens, Ilse De Vos, and Els Lefever. 2023. Evaluating Existing Lemmatisers on Unedited Byzantine Greek Poetry. In Proceedings of the Ancient Language Processing Workshop, pages 111–116, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Evaluating Existing Lemmatisers on Unedited Byzantine Greek Poetry (Swaelens et al., ALP-WS 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2023.alp-1.13.pdf