Studying Language Variation Considering the Re-Usability of Modern Theories, Tools and Resources for Annotating Explicit and Implicit Events in Centuries Old Text

Stella Verkijk, Pia Sommerauer, Piek Vossen


Abstract
This paper discusses the re-usibility of existing approaches, tools and automatic techniques for the annotation and detection of events in a challenging variant of centuries old Dutch written in the archives of the Dutch East India Company. We describe our annotation process and provide a thorough analysis of different versions of manually annotated data and the first automatic results from two fine-tuned Language Models. Through the analysis of this complete process, the paper studies two things: to what extent we can use NLP theories and tasks formulated for modern English to formulate an annotation task for Early Modern Dutch and to what extent we can use NLP models and tools built for modern Dutch (and other languages) on Early Modern Dutch. We believe these analyses give us insight into how to deal with the large variation language showcases in describing events, and how this variation may differ accross domains. We release the annotation guidelines, annotated data, and code.
Anthology ID:
2024.vardial-1.15
Volume:
Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Yves Scherrer, Tommi Jauhiainen, Nikola Ljubešić, Marcos Zampieri, Preslav Nakov, Jörg Tiedemann
Venues:
VarDial | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
174–187
Language:
URL:
https://aclanthology.org/2024.vardial-1.15
DOI:
Bibkey:
Cite (ACL):
Stella Verkijk, Pia Sommerauer, and Piek Vossen. 2024. Studying Language Variation Considering the Re-Usability of Modern Theories, Tools and Resources for Annotating Explicit and Implicit Events in Centuries Old Text. In Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024), pages 174–187, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Studying Language Variation Considering the Re-Usability of Modern Theories, Tools and Resources for Annotating Explicit and Implicit Events in Centuries Old Text (Verkijk et al., VarDial-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.vardial-1.15.pdf
Supplementary material:
 2024.vardial-1.15.SupplementaryMaterial.txt