Abstract
This paper introduces two new OCR models for the Irish language, a BART-based OCR post-correction model, and the core dataset on which they were trained: a monthly bilingual Irish-English newspaper named An Gaodhal that was produced from 1881 to 1898 by an Irishman living in Brooklyn, New York.- Anthology ID:
- 2024.lt4hala-1.9
- Volume:
- Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Rachele Sprugnoli, Marco Passarotti
- Venues:
- LT4HALA | WS
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 65–78
- Language:
- URL:
- https://aclanthology.org/2024.lt4hala-1.9
- DOI:
- Cite (ACL):
- Oksana Dereza, Deirdre Ní Chonghaile, and Nicholas Wolf. 2024. “To Have the ‘Million’ Readers Yet”: Building a Digitally Enhanced Edition of the Bilingual Irish-English Newspaper an Gaodhal (1881-1898). In Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024, pages 65–78, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- “To Have the ‘Million’ Readers Yet”: Building a Digitally Enhanced Edition of the Bilingual Irish-English Newspaper an Gaodhal (1881-1898) (Dereza et al., LT4HALA-WS 2024)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2024.lt4hala-1.9.pdf