MELOV: Multimodal Entity Linking with Optimized Visual Features in Latent Space
Xuhui Sui, Ying Zhang, Yu Zhao, Kehui Song, Baohang Zhou, Xiaojie Yuan
Abstract
Multimodal entity linking (MEL), which aligns ambiguous mentions within multimodal contexts to referent entities from multimodal knowledge bases, is essential for many natural language processing applications. Previous MEL methods mainly focus on exploring complex multimodal interaction mechanisms to better capture coherence evidence between mentions and entities by mining complementary information. However, in real-world social media scenarios, vision modality often exhibits low quality, low value, or low relevance to the mention. Integrating such information directly will backfire, leading to a weakened consistency between mentions and their corresponding entities. In this paper, we propose a novel latent space vision feature optimization framework MELOV, which combines inter-modality and intra-modality optimizations to address these challenges. For the inter-modality optimization, we exploit the variational autoencoder to mine shared information and generate text-based visual features. For the intra-modality optimization, we consider the relationships between mentions and build graph convolutional network to aggregate the visual features of semantic similar neighbors. Extensive experiments on three benchmark datasets demonstrate the superiority of our proposed framework.- Anthology ID:
- 2024.findings-acl.46
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2024
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand
- Editors:
- Lun-Wei Ku, Andre Martins, Vivek Srikumar
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 816–826
- Language:
- URL:
- https://aclanthology.org/2024.findings-acl.46
- DOI:
- 10.18653/v1/2024.findings-acl.46
- Cite (ACL):
- Xuhui Sui, Ying Zhang, Yu Zhao, Kehui Song, Baohang Zhou, and Xiaojie Yuan. 2024. MELOV: Multimodal Entity Linking with Optimized Visual Features in Latent Space. In Findings of the Association for Computational Linguistics: ACL 2024, pages 816–826, Bangkok, Thailand. Association for Computational Linguistics.
- Cite (Informal):
- MELOV: Multimodal Entity Linking with Optimized Visual Features in Latent Space (Sui et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/autopr/2024.findings-acl.46.pdf