Felix Thielen
2026
AmDi - Ambiguous Words Diachronic Dataset
Felix Thielen | Kai Kugler
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Felix Thielen | Kai Kugler
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Two fundamental tasks in computational linguistics are Lexical Semantic Change Detection and Word Sense Disambiguation. Both commonly rely on large annotated datasets. Most available datasets cover only one of two areas: diachronic corpora used for Semantic Change Detection, or synchronic datasets for Word Sense Disambiguation. To address this gap, the AmDi dataset is introduced as a German-language resource that supports a more fine-grained diachronic analysis of word meanings, while also enabling the investigation of embeddings generated with corresponding models, as well as providing a foundation for Word Sense Disambiguation tasks.