All Entities are Not Created Equal: Examining the Long Tail for Ultra-Fine Entity Typing

Advait Deshmukh, Ashwin Umadi, Dananjay Srinivas, Maria Leonor Pacheco


Abstract
Due to their capacity to acquire world knowledge from large corpora, pre-trained language models (PLMs) are extensively used in ultra-fine entity typing tasks where the space of labels is extremely large. In this work, we explore the limitations of the knowledge acquired by PLMs by proposing a novel heuristic to approximate the pre-training distribution of entities when the pre-training data is unknown. Then, we systematically demonstrate that entity-typing approaches that rely solely on the parametric knowledge of PLMs struggle significantly with entities at the long tail of the pre-training distribution, and that knowledge-infused approaches can account for some of these shortcomings. Our findings suggest that we need to go beyond PLMs to produce solutions that perform well for infrequent entities.
Anthology ID:
2025.starsem-1.15
Volume:
Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025)
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Lea Frermann, Mark Stevenson
Venue:
*SEM
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
189–201
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.starsem-1.15/
DOI:
Bibkey:
Cite (ACL):
Advait Deshmukh, Ashwin Umadi, Dananjay Srinivas, and Maria Leonor Pacheco. 2025. All Entities are Not Created Equal: Examining the Long Tail for Ultra-Fine Entity Typing. In Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025), pages 189–201, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
All Entities are Not Created Equal: Examining the Long Tail for Ultra-Fine Entity Typing (Deshmukh et al., *SEM 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.starsem-1.15.pdf