OdyCy – A general-purpose NLP pipeline for Ancient Greek

Jan Kostkan, Márton Kardos, Jacob Palle Bliddal Mortensen, Kristoffer Laigaard Nielbo


Abstract
This paper presents a general-purpose NLP pipeline that achieves state-of-the-art performance on the Ancient Greek Perseus UD Treebank for several tasks (POS Tagging, Morphological Analysis and Dependency Parsing), and close to state-of-the-art performance on the Proiel UD Treebank. Our aim is to provide a reproducible, open source language processing pipeline for Ancient Greek, capable of handling input texts of varying quality. We measure the performance of our model against other comparable tools and then evaluate lemmatization errors.
Anthology ID:
2023.latechclfl-1.14
Volume:
Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Stefania Degaetano-Ortlieb, Anna Kazantseva, Nils Reiter, Stan Szpakowicz
Venue:
LaTeCHCLfL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
128–134
Language:
URL:
https://aclanthology.org/2023.latechclfl-1.14
DOI:
10.18653/v1/2023.latechclfl-1.14
Bibkey:
Cite (ACL):
Jan Kostkan, Márton Kardos, Jacob Palle Bliddal Mortensen, and Kristoffer Laigaard Nielbo. 2023. OdyCy – A general-purpose NLP pipeline for Ancient Greek. In Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 128–134, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
OdyCy – A general-purpose NLP pipeline for Ancient Greek (Kostkan et al., LaTeCHCLfL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2023.latechclfl-1.14.pdf
Video:
 https://preview.aclanthology.org/emnlp-22-attachments/2023.latechclfl-1.14.mp4