Marcel Milich

2023

pdf abs
ZELDA: A Comprehensive Benchmark for Supervised Entity Disambiguation
Marcel Milich | Alan Akbik
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Entity disambiguation (ED) is the task of disambiguating named entity mentions in text to unique entries in a knowledge base. Due to its industrial relevance, as well as current progress in leveraging pre-trained language models, a multitude of ED approaches have been proposed in recent years. However, we observe a severe lack of uniformity across experimental setups in current ED work,rendering a direct comparison of approaches based solely on reported numbers impossible: Current approaches widely differ in the data set used to train, the size of the covered entity vocabulary, and the usage of additional signals such as candidate lists. To address this issue, we present ZELDA , a novel entity disambiguation benchmark that includes a unified training data set, entity vocabulary, candidate lists, as well as challenging evaluation splits covering 8 different domains. We illustrate its design and construction, and present experiments in which we train and compare current state-of-the-art approaches on our benchmark. To encourage greater direct comparability in the entity disambiguation domain, we make our benchmark publicly available to the research community.

Co-authors

Alan Akbik 1

Venues

eacl1