In-context Learning for Few-shot Multimodal Named Entity Recognition

Chenran Cai, Qianlong Wang, Bin Liang, Bing Qin, Min Yang, Kam-Fai Wong, Ruifeng Xu


Abstract
Thanks in part to the availability of copious annotated resources for some entity categories, existing studies have achieved superior performance in multimodal named entity recognition (MNER). However, in the real-world scenario, it is infeasible to enumerate all entity categories in advance. Therefore, in this paper, we formulate a new few-shot multimodal named entity recognition (FewMNER) task, which aims to effectively locate and identify named entities for a text-image pair only using a small number of labeled examples. Further, we explore the merit of in-context learning (ICL) and propose a novel framework to deal with FewMNER, where three points are taken into account: i.e., converting visual modality, selecting useful examples, and designing an effective task demonstration. Specifically, we first employ an image caption model to convert images into textual descriptions, enabling large language models to absorb information from visual modality. Then, we use the ranking of the sum of similarity rankings from both text and image modalities to select k-nearest examples, which form a demonstration context. Finally, we utilize the MNER definition and the meaning of each entity category as effective instruction. Extensive experimental results demonstrate that our framework outperforms baselines under several few-shot settings.
Anthology ID:
2023.findings-emnlp.196
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2969–2979
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.196
DOI:
10.18653/v1/2023.findings-emnlp.196
Bibkey:
Cite (ACL):
Chenran Cai, Qianlong Wang, Bin Liang, Bing Qin, Min Yang, Kam-Fai Wong, and Ruifeng Xu. 2023. In-context Learning for Few-shot Multimodal Named Entity Recognition. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 2969–2979, Singapore. Association for Computational Linguistics.
Cite (Informal):
In-context Learning for Few-shot Multimodal Named Entity Recognition (Cai et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2023.findings-emnlp.196.pdf