Abstract
This contribution presents a novel approach to the development and evaluation of transformer-based models for Named Entity Recognition and Classification in Ancient Greek texts. We trained two models with annotated datasets by consolidating potentially ambiguous entity types under a harmonized set of classes. Then, we tested their performance with out-of-domain texts, reproducing a real-world use case. Both models performed very well under these conditions, with the multilingual model being slightly superior on the monolingual one. In the conclusion, we emphasize current limitations due to the scarcity of high-quality annotated corpora and to the lack of cohesive annotation strategies for ancient languages.- Anthology ID:
- 2024.lt4hala-1.11
- Volume:
- Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Rachele Sprugnoli, Marco Passarotti
- Venues:
- LT4HALA | WS
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 89–97
- Language:
- URL:
- https://preview.aclanthology.org/remove-affiliations/2024.lt4hala-1.11/
- DOI:
- Cite (ACL):
- Chiara Palladino and Tariq Yousef. 2024. Development of Robust NER Models and Named Entity Tagsets for Ancient Greek. In Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024, pages 89–97, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Development of Robust NER Models and Named Entity Tagsets for Ancient Greek (Palladino & Yousef, LT4HALA 2024)
- PDF:
- https://preview.aclanthology.org/remove-affiliations/2024.lt4hala-1.11.pdf