AmDi - Ambiguous Words Diachronic Dataset

Felix Thielen, Kai Kugler


Abstract
Two fundamental tasks in computational linguistics are Lexical Semantic Change Detection and Word Sense Disambiguation. Both commonly rely on large annotated datasets. Most available datasets cover only one of two areas: diachronic corpora used for Semantic Change Detection, or synchronic datasets for Word Sense Disambiguation. To address this gap, the AmDi dataset is introduced as a German-language resource that supports a more fine-grained diachronic analysis of word meanings, while also enabling the investigation of embeddings generated with corresponding models, as well as providing a foundation for Word Sense Disambiguation tasks.
Anthology ID:
2026.lrec-main.934
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
11926–11941
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.934/
DOI:
Bibkey:
Cite (ACL):
Felix Thielen and Kai Kugler. 2026. AmDi - Ambiguous Words Diachronic Dataset. International Conference on Language Resources and Evaluation, main:11926–11941.
Cite (Informal):
AmDi - Ambiguous Words Diachronic Dataset (Thielen & Kugler, LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.934.pdf