Building a Multimodal Entity Linking Dataset From Tweets

Omar Adjali, Romaric Besançon, Olivier Ferret, Hervé Le Borgne, Brigitte Grau


Abstract
The task of Entity linking, which aims at associating an entity mention with a unique entity in a knowledge base (KB), is useful for advanced Information Extraction tasks such as relation extraction or event detection. Most of the studies that address this problem rely only on textual documents while an increasing number of sources are multimedia, in particular in the context of social media where messages are often illustrated with images. In this article, we address the Multimodal Entity Linking (MEL) task, and more particularly the problem of its evaluation. To this end, we propose a novel method to quasi-automatically build annotated datasets to evaluate methods on the MEL task. The method collects text and images to jointly build a corpus of tweets with ambiguous mentions along with a Twitter KB defining the entities. We release a new annotated dataset of Twitter posts associated with images. We study the key characteristics of the proposed dataset and evaluate the performance of several MEL approaches on it.
Anthology ID:
2020.lrec-1.528
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4285–4292
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.528
DOI:
Bibkey:
Cite (ACL):
Omar Adjali, Romaric Besançon, Olivier Ferret, Hervé Le Borgne, and Brigitte Grau. 2020. Building a Multimodal Entity Linking Dataset From Tweets. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4285–4292, Marseille, France. European Language Resources Association.
Cite (Informal):
Building a Multimodal Entity Linking Dataset From Tweets (Adjali et al., LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.lrec-1.528.pdf