eSTÓR: Curating Irish Datasets for Machine Translation
Abigail Walsh, Órla Ní Loinsigh, Jane Adkins, Ornait O’Connell, Mark Andrade, Teresa Clifford, Federico Gaspari, Jane Dunne, Brian Davis
Abstract
Minority languages such as Irish are massively under-resourced, particularly in terms of high-quality domain-relevant data, limiting the capabilities of machine translation (MT) engines, even those integrating large language models (LLMs). The eSTÓR project, described in this paper, focuses on the collection and curation of high-quality Irish text data for diverse domains.- Anthology ID:
- 2025.mtsummit-2.28
- Volume:
- Proceedings of Machine Translation Summit XX: Volume 2
- Month:
- June
- Year:
- 2025
- Address:
- Geneva, Switzerland
- Editors:
- Pierrette Bouillon, Johanna Gerlach, Sabrina Girletti, Lise Volkart, Raphael Rubino, Rico Sennrich, Samuel Läubli, Martin Volk, Miquel Esplà-Gomis, Vincent Vandeghinste, Helena Moniz, Sara Szoc
- Venue:
- MTSummit
- SIG:
- Publisher:
- European Association for Machine Translation
- Note:
- Pages:
- 115–116
- Language:
- URL:
- https://preview.aclanthology.org/nschneid-patch-1/2025.mtsummit-2.28/
- DOI:
- Cite (ACL):
- Abigail Walsh, Órla Ní Loinsigh, Jane Adkins, Ornait O’Connell, Mark Andrade, Teresa Clifford, Federico Gaspari, Jane Dunne, and Brian Davis. 2025. eSTÓR: Curating Irish Datasets for Machine Translation. In Proceedings of Machine Translation Summit XX: Volume 2, pages 115–116, Geneva, Switzerland. European Association for Machine Translation.
- Cite (Informal):
- eSTÓR: Curating Irish Datasets for Machine Translation (Walsh et al., MTSummit 2025)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2025.mtsummit-2.28.pdf