MERLIN: A Testbed for Multilingual Multimodal Entity Recognition and Linking

Sathyanarayanan Ramamoorthy, Vishwa Shah, Simran Khanuja, Zaid Sheikh, Shan Jie, Ann Chia, Shearman Chua, Graham Neubig


Abstract
This paper introduces MERLIN, a novel testbed system for the task of Multilingual Multimodal Entity Linking. The created dataset includes BBC news article titles, paired with corresponding images, in five languages: Hindi, Japanese, Indonesian, Vietnamese, and Tamil, featuring over 7,000 named entity mentions linked to 2,500 unique Wikidata entities. We also include several benchmarks using multilingual and multimodal entity linking methods exploring different language models like LLaMa-2 and Aya-23. Our findings indicate that incorporating visual data improves the accuracy of entity linking, especially for entities where the textual context is ambiguous or insufficient, and particularly for models that do not have strong multilingual abilities. For the work, the dataset, methods are available online.1
Anthology ID:
2026.tacl-1.19
Volume:
Transactions of the Association for Computational Linguistics, Volume 14
Month:
Year:
2026
Address:
Cambridge, MA
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
399–417
Language:
URL:
https://preview.aclanthology.org/ingest-latest-mitpress-cl-tacl/2026.tacl-1.19/
DOI:
10.1162/tacl.a.633
Bibkey:
Cite (ACL):
Sathyanarayanan Ramamoorthy, Vishwa Shah, Simran Khanuja, Zaid Sheikh, Shan Jie, Ann Chia, Shearman Chua, and Graham Neubig. 2026. MERLIN: A Testbed for Multilingual Multimodal Entity Recognition and Linking. Transactions of the Association for Computational Linguistics, 14:399–417.
Cite (Informal):
MERLIN: A Testbed for Multilingual Multimodal Entity Recognition and Linking (Ramamoorthy et al., TACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-latest-mitpress-cl-tacl/2026.tacl-1.19.pdf