Evelien de Graaf
2023
Training and Evaluation of Named Entity Recognition Models for Classical Latin
Marijke Beersmans
|
Evelien de Graaf
|
Tim Van de Cruys
|
Margherita Fantoli
Proceedings of the Ancient Language Processing Workshop
We evaluate the performance of various models on the task of named entity recognition (NER) for classical Latin. Using an existing dataset, we train two transformer-based LatinBERT models and one shallow conditional random field (CRF) model. The performance is assessed using both standard metrics and a detailed manual error analysis, and compared to the results obtained by different already released Latin NER tools. Both analyses demonstrate that the BERT models achieve a better f1-score than the other models. Furthermore, we annotate new, unseen data for further evaluation of the models, and we discuss the impact of annotation choices on the results.
2022
AGILe: The First Lemmatizer for Ancient Greek Inscriptions
Evelien de Graaf
|
Silvia Stopponi
|
Jasper K. Bos
|
Saskia Peels-Matthey
|
Malvina Nissim
Proceedings of the Thirteenth Language Resources and Evaluation Conference
To facilitate corpus searches by classicists as well as to reduce data sparsity when training models, we focus on the automatic lemmatization of ancient Greek inscriptions, which have not received as much attention in this sense as literary text data has. We show that existing lemmatizers for ancient Greek, trained on literary data, are not performant on epigraphic data, due to major language differences between the two types of texts. We thus train the first inscription-specific lemmatizer achieving above 80% accuracy, and make both the models and the lemmatized data available to the community. We also provide a detailed error analysis highlighting peculiarities of inscriptions which again highlights the importance of a lemmatizer dedicated to inscriptions.
Search
Co-authors
- Marijke Beersmans 1
- Tim Van de Cruys 1
- Margherita Fantoli 1
- Silvia Stopponi 1
- Jasper K. Bos 1
- show all...