A Graph Interaction Framework on Relevance for Multimodal Named Entity Recognition with Multiple Images

Jiachen Zhao; Shizhou Huang; Xin Lin

A Graph Interaction Framework on Relevance for Multimodal Named Entity Recognition with Multiple Images

Abstract

Posts containing multiple images have significant research potential in Multimodal Named Entity Recognition nowadays. The previous methods determine whether the images are related to named entities in the text through similarity computation, such as using CLIP. However, it is not effective in some cases and not conducive to task transfer, especially in multi-image scenarios. To address the issue, we propose a graph interaction framework on relevance (GIFR) for Multimodal Named Entity Recognition with multiple images. For humans, they have the abilities to distinguish whether an image is relevant to named entities, but human capabilities are difficult to model. Therefore, we propose using reinforcement learning based on human preference to integrate human abilities into the model to determine whether an image-text pair is relevant, which is referred to as relevance. To better leverage relevance, we construct a heterogeneous graph and introduce graph transformer to enable information interaction. Experiments on benchmark datasets demonstrate that our method achieves the state-of-the-art performance.

Anthology ID:: 2025.coling-main.82
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1237–1246
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.coling-main.82/
DOI:
Bibkey:
Cite (ACL):: Jiachen Zhao, Shizhou Huang, and Xin Lin. 2025. A Graph Interaction Framework on Relevance for Multimodal Named Entity Recognition with Multiple Images. In Proceedings of the 31st International Conference on Computational Linguistics, pages 1237–1246, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: A Graph Interaction Framework on Relevance for Multimodal Named Entity Recognition with Multiple Images (Zhao et al., COLING 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.coling-main.82.pdf

PDF Cite Search Fix data