Yongquan Ji
2025
Breaking the Noise Barrier: LLM-Guided Semantic Filtering and Enhancement for Multi-Modal Entity Alignment
Chenglong Lu
|
Chenxiao Li
|
Jingwei Cheng
|
Yongquan Ji
|
Guoqing Chen
|
Fu Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Multi-modal entity alignment (MMEA) aims to identify equivalent entities between two multimodal knowledge graphs (MMKGs). However, the intrinsic noise within modalities, such as the inconsistency in visual modality and redundant attributes, has not been thoroughly investigated. Excessive noise not only weakens semantic representation but also increases the risk of overfitting in attention-based fusion methods. To address this, we propose LGEA, a novel LLMguided MMEA framework that prioritizes noise reduction before fusion. Specifically, LGEA introduces two key strategies: (1) fine-grained visual filtering to remove irrelevant images at the semantic level, and (2) contextual summarization of attribute information to enhance entity semantics. To our knowledge, we are the first work to apply LLMs for both visual filtering and attribute-level semantic enhancement in MMEA. Experiments on multiple benchmarks, including the noisy FB YG dataset, show that LGEA sets a new state-of-the-art (SOTA) in robust multi-modal alignment, highlighting the potential of noise-aware strategies as a promising direction for future MMEA research.
Capturing Latent Modal Association For Multimodal Entity Alignment
Yongquan Ji
|
Jingwei Cheng
|
Fu Zhang
|
Chenglong Lu
Findings of the Association for Computational Linguistics: EMNLP 2025
Multimodal entity alignment aims to identify equivalent entities in heterogeneous knowledge graphs by leveraging complementary information from multiple modalities. However, existing methods often overlook the quality of input modality embeddings during modality interaction – such as missing modality generation, modal information transfer, modality fusion – which may inadvertently amplify noise propagation while suppressing discriminative feature representations. To address these issues, we propose a novel model – CLAMEA for capturing latent modal association for multimodal entity alignment. Specifically, we use a self- attention mechanism to enhance salient information while attenuating noise within individual modality embeddings. We design a dynamic modal attention flow fusion module to capture and balance latent intra- and inter-modal associations and generate fused modality embeddings. Based on both fused and available modalities, we adopt variational autoencoder (VAE) to generate high quality embeddings for the missing modality. We use a cross-modal association extraction module to extract latent modal associations from the completed modality embeddings, further enhancing embedding quality. Experimental results on two real-world datasets demonstrate the effectiveness of our approach, which achieves an absolute 3.1% higher Hits@ 1 score than the sota method.