“To Have the ‘Million’ Readers Yet”: Building a Digitally Enhanced Edition of the Bilingual Irish-English Newspaper an Gaodhal (1881-1898)

Oksana Dereza, Deirdre Ní Chonghaile, Nicholas Wolf


Abstract
This paper introduces two new OCR models for the Irish language, a BART-based OCR post-correction model, and the core dataset on which they were trained: a monthly bilingual Irish-English newspaper named An Gaodhal that was produced from 1881 to 1898 by an Irishman living in Brooklyn, New York.
Anthology ID:
2024.lt4hala-1.9
Volume:
Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Rachele Sprugnoli, Marco Passarotti
Venues:
LT4HALA | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
65–78
Language:
URL:
https://aclanthology.org/2024.lt4hala-1.9
DOI:
Bibkey:
Cite (ACL):
Oksana Dereza, Deirdre Ní Chonghaile, and Nicholas Wolf. 2024. “To Have the ‘Million’ Readers Yet”: Building a Digitally Enhanced Edition of the Bilingual Irish-English Newspaper an Gaodhal (1881-1898). In Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024, pages 65–78, Torino, Italia. ELRA and ICCL.
Cite (Informal):
“To Have the ‘Million’ Readers Yet”: Building a Digitally Enhanced Edition of the Bilingual Irish-English Newspaper an Gaodhal (1881-1898) (Dereza et al., LT4HALA-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2024.lt4hala-1.9.pdf