TweetTER: A Benchmark for Target Entity Retrieval on Twitter without Knowledge Bases

Kiamehr Rezaee, Jose Camacho-Collados, Mohammad Taher Pilehvar


Abstract
Entity linking is a well-established task in NLP consisting of associating entity mentions with entries in a knowledge base. Current models have demonstrated competitive performance in standard text settings. However, when it comes to noisy domains such as social media, certain challenges still persist. Typically, to evaluate entity linking on existing benchmarks, a comprehensive knowledge base is necessary and models are expected to possess an understanding of all the entities contained within the knowledge base. However, in practical scenarios where the objective is to retrieve sentences specifically related to a particular entity, strict adherence to a complete understanding of all entities in the knowledge base may not be necessary. To address this gap, we introduce TweetTER (Tweet Target Entity Retrieval), a novel benchmark that aims to bridge the challenges in entity linking. The distinguishing feature of this benchmark is its approach of re-framing entity linking as a binary entity retrieval task. This enables the evaluation of language models’ performance without relying on a conventional knowledge base, providing a more practical and versatile evaluation framework for assessing the effectiveness of language models in entity retrieval tasks.
Anthology ID:
2024.lrec-main.1468
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
16890–16896
Language:
URL:
https://aclanthology.org/2024.lrec-main.1468
DOI:
Bibkey:
Cite (ACL):
Kiamehr Rezaee, Jose Camacho-Collados, and Mohammad Taher Pilehvar. 2024. TweetTER: A Benchmark for Target Entity Retrieval on Twitter without Knowledge Bases. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 16890–16896, Torino, Italia. ELRA and ICCL.
Cite (Informal):
TweetTER: A Benchmark for Target Entity Retrieval on Twitter without Knowledge Bases (Rezaee et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.lrec-main.1468.pdf