Named Entity Recognition for the Irish Language

Jane Adkins, Hugo Collins, Joachim Wagner, Abigail Walsh, Brian Davis


Abstract
The Irish language has been deemed ‘definitely endangered’ (Moseley, 2012) and has been clas- sified as having ‘weak or no support’ (Lynn, 2023) regarding digital resources in spite of its status as the first official and national language of the Republic of Ireland. This research de- velops the first named entity recognition (NER) tool for the Irish language, one of the essen- tial tasks identified by the Digital Plan for Irish (Ní Chasaide et al., 2022). In this study, we produce a small gold-standard NER-annotated corpus and compare both monolingual and mul- tilingual BERT models fine-tuned on this task. We experiment with different model architec- tures and low-resource language approaches to enrich our dataset. We test our models on a mix of single- and multi-word named entities as well as a specific multi-word named entity test set. Our proposed gaBERT model with the implementation of random data augmentation and a conditional random fields layer demon- strates significant performance improvements over baseline models, alternative architectures, and multilingual models, achieving an F1 score of 76.52. This study contributes to advanc- ing Irish language technologies and supporting Irish language digital resources, providing a basis for Irish NER and identification of other MWE types.
Anthology ID:
2025.mwe-1.9
Volume:
Proceedings of the 21st Workshop on Multiword Expressions (MWE 2025)
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico, U.S.A.
Editors:
Atul Kr. Ojha, Voula Giouli, Verginica Barbu Mititelu, Mathieu Constant, Gražina Korvel, A. Seza Doğruöz, Alexandre Rademaker
Venues:
MWE | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
82–96
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.mwe-1.9/
DOI:
Bibkey:
Cite (ACL):
Jane Adkins, Hugo Collins, Joachim Wagner, Abigail Walsh, and Brian Davis. 2025. Named Entity Recognition for the Irish Language. In Proceedings of the 21st Workshop on Multiword Expressions (MWE 2025), pages 82–96, Albuquerque, New Mexico, U.S.A.. Association for Computational Linguistics.
Cite (Informal):
Named Entity Recognition for the Irish Language (Adkins et al., MWE 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.mwe-1.9.pdf