Too Young to NER: Improving Entity Recognition on Dutch Historical Documents

Vera Provatorova, Marieke van Erp, Evangelos Kanoulas


Abstract
Named entity recognition (NER) on historical texts is beneficial for the field of digital humanities, as it allows to easily search for the names of people, places and other entities in digitised archives. While the task of historical NER in different languages has been gaining popularity in recent years, Dutch historical NER remains an underexplored topic. Using a recently released historical dataset from the Dutch Language Institute, we train three BERT-based models and analyse the errors to identify main challenges. All three models outperform a contemporary multilingual baseline by a large margin on historical test data.
Anthology ID:
2024.lt4hala-1.4
Volume:
Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Rachele Sprugnoli, Marco Passarotti
Venues:
LT4HALA | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
30–35
Language:
URL:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2024.lt4hala-1.4/
DOI:
Bibkey:
Cite (ACL):
Vera Provatorova, Marieke van Erp, and Evangelos Kanoulas. 2024. Too Young to NER: Improving Entity Recognition on Dutch Historical Documents. In Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024, pages 30–35, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Too Young to NER: Improving Entity Recognition on Dutch Historical Documents (Provatorova et al., LT4HALA 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2024.lt4hala-1.4.pdf