Systematic Study of Long Tail Phenomena in Entity Linking

Filip Ilievski, Piek Vossen, Stefan Schlobach


Abstract
State-of-the-art entity linkers achieve high accuracy scores with probabilistic methods. However, these scores should be considered in relation to the properties of the datasets they are evaluated on. Until now, there has not been a systematic investigation of the properties of entity linking datasets and their impact on system performance. In this paper we report on a series of hypotheses regarding the long tail phenomena in entity linking datasets, their interaction, and their impact on system performance. Our systematic study of these hypotheses shows that evaluation datasets mainly capture head entities and only incidentally cover data from the tail, thus encouraging systems to overfit to popular/frequent and non-ambiguous cases. We find the most difficult cases of entity linking among the infrequent candidates of ambiguous forms. With our findings, we hope to inspire future designs of both entity linking systems and evaluation datasets. To support this goal, we provide a list of recommended actions for better inclusion of tail cases.
Anthology ID:
C18-1056
Volume:
Proceedings of the 27th International Conference on Computational Linguistics
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Emily M. Bender, Leon Derczynski, Pierre Isabelle
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
664–674
Language:
URL:
https://aclanthology.org/C18-1056
DOI:
Bibkey:
Cite (ACL):
Filip Ilievski, Piek Vossen, and Stefan Schlobach. 2018. Systematic Study of Long Tail Phenomena in Entity Linking. In Proceedings of the 27th International Conference on Computational Linguistics, pages 664–674, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Systematic Study of Long Tail Phenomena in Entity Linking (Ilievski et al., COLING 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/C18-1056.pdf