Thora Hagen
2025
Lexical Semantic Change Annotation with Large Language Models
Thora Hagen
Proceedings of the 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2025)
This paper explores the application of state-of-the-art large language models (LLMs) to the task of lexical semantic change annotation (LSCA) using the historical German DURel dataset. We evaluate five LLMs, and investigate whether retrieval-augmented generation (RAG) with historical encyclopedic knowledge enhances results. Our findings show that the Llama3.3 model achieves comparable performance to GPT-4o despite significant parameter differences, while RAG marginally improves predictions for smaller models but hampers performance for larger ones. Further analysis suggests that our additional context benefits nouns more than verbs and adjectives, demonstrating the nuances of integrating external knowledge for semantic tasks.
2020
Twenty-two Historical Encyclopedias Encoded in TEI: a New Resource for the Digital Humanities
Thora Hagen
|
Erik Ketzan
|
Fotis Jannidis
|
Andreas Witt
Proceedings of the 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
This paper accompanies the corpus publication of EncycNet, a novel XML/TEI annotated corpus of 22 historical German encyclopedias from the early 18th to early 20th century. We describe the creation and annotation of the corpus, including the rationale for its development, suggested methodology for TEI annotation, possible use cases and future work. While many well-developed annotation standards for lexical resources exist, none can adequately model the encyclopedias at hand, and we therefore suggest how the TEI Lex-0 standard may be modified with additional guidelines for the annotation of historical encyclopedias. As the digitization and annotation of historical encyclopedias are settling on TEI as the de facto standard, our methodology may inform similar projects.