Challenging Error Correction in Recognised Byzantine Greek
John Pavlopoulos, Vasiliki Kougia, Esteban Garces Arias, Paraskevi Platanou, Stepan Shabalin, Konstantina Liagkou, Emmanouil Papadatos, Holger Essler, Jean-Baptiste Camps, Franz Fischer
Abstract
Automatic correction of errors in Handwritten Text Recognition (HTR) output poses persistent challenges yet to be fully resolved. In this study, we introduce a shared task aimed at addressing this challenge, which attracted 271 submissions, yielding only a handful of promising approaches. This paper presents the datasets, the most effective methods, and an experimental analysis in error-correcting HTRed manuscripts and papyri in Byzantine Greek, the language that followed Classical and preceded Modern Greek. By using recognised and transcribed data from seven centuries, the two best-performing methods are compared, one based on a neural encoder-decoder architecture and the other based on engineered linguistic rules. We show that the recognition error rate can be reduced by both, up to 2.5 points at the level of characters and up to 15 at the level of words, while also elucidating their respective strengths and weaknesses.- Anthology ID:
- 2024.ml4al-1.1
- Volume:
- Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)
- Month:
- August
- Year:
- 2024
- Address:
- Hybrid in Bangkok, Thailand and online
- Editors:
- John Pavlopoulos, Thea Sommerschield, Yannis Assael, Shai Gordin, Kyunghyun Cho, Marco Passarotti, Rachele Sprugnoli, Yudong Liu, Bin Li, Adam Anderson
- Venues:
- ML4AL | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1–12
- Language:
- URL:
- https://preview.aclanthology.org/landing_page/2024.ml4al-1.1/
- DOI:
- 10.18653/v1/2024.ml4al-1.1
- Cite (ACL):
- John Pavlopoulos, Vasiliki Kougia, Esteban Garces Arias, Paraskevi Platanou, Stepan Shabalin, Konstantina Liagkou, Emmanouil Papadatos, Holger Essler, Jean-Baptiste Camps, and Franz Fischer. 2024. Challenging Error Correction in Recognised Byzantine Greek. In Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024), pages 1–12, Hybrid in Bangkok, Thailand and online. Association for Computational Linguistics.
- Cite (Informal):
- Challenging Error Correction in Recognised Byzantine Greek (Pavlopoulos et al., ML4AL 2024)
- PDF:
- https://preview.aclanthology.org/landing_page/2024.ml4al-1.1.pdf