Abstract
The paper presents a submission to the EvaLatin 2022 shared task. Our system places first for lemmatization, part-of-speech and morphological tagging in both closed and open modalities. The results for cross-genre and cross-time sub-tasks show that the system handles the diachronic and diastratic variation of Latin. The architecture employs state-of-the-art transformer models. For part-of-speech and morphological tagging, we use XLM-RoBERTa large, while for lemmatization a ByT5 small model was employed. The paper features a thorough discussion of part-of-speech and lemmatization errors which shows how the system performance may be improved for Classical, Medieval and Neo-Latin texts.- Anthology ID:
- 2022.lt4hala-1.31
- Volume:
- Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Rachele Sprugnoli, Marco Passarotti
- Venue:
- LT4HALA
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 193–197
- Language:
- URL:
- https://aclanthology.org/2022.lt4hala-1.31
- DOI:
- Cite (ACL):
- Krzysztof Wróbel and Krzysztof Nowak. 2022. Transformer-based Part-of-Speech Tagging and Lemmatization for Latin. In Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages, pages 193–197, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Transformer-based Part-of-Speech Tagging and Lemmatization for Latin (Wróbel & Nowak, LT4HALA 2022)
- PDF:
- https://preview.aclanthology.org/improve-issue-templates/2022.lt4hala-1.31.pdf