Chenxiao Li
2025
Probing Relative Interaction and Dynamic Calibration in Multi-modal Entity Alignment
Chenxiao Li
|
Jingwei Cheng
|
Qiang Tong
|
Fu Zhang
|
Cairui Wang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multi-modal entity alignment aims to identify equivalent entities between two different multi-modal knowledge graphs. Current methods have made significant progress by improving embedding and cross-modal fusion. However, most of them depend on using loss functions to capture the relationship between modalities or adopt a one-time strategy to directly compute modality weights using attention mechanisms, which overlooks the relative interactions between modalities at the entity level and the accuracy of modality weights, thereby hindering the generalization to diverse entities. To address this challenge, we propose RICEA, a relative interaction and calibration framework for multi-modal entity alignment, which dynamically computes weights based on the relative interaction and recalibrates the weights according to their uncertainties. Among these, we propose a novel method called ADC that utilizes attention mechanisms to perceive the uncertainty of the weight for each modality, rather than directly calculating the weight of each modality as in previous works. Across 5 datasets and 23 settings, our proposed framework significantly outperforms other baselines. Our code and data are available at https://github.com/ChenxiaoLi-Joe/RICEA.
Exploring the Impacts of Feature Fusion Strategy in Multi-modal Entity Alignment
Chenxiao Li
|
Jingwei Cheng
|
Qiang Tong
|
Fu Zhang
Proceedings of the 31st International Conference on Computational Linguistics
Multi-modal entity alignment aims to identify equivalent entities between two different multi-modal knowledge graphs, which consist of structural triples and images associated with entities. Unfortunately, prior works fuse the multi-modal knowledge of all entities only via solely one single fusion strategy. Therefore, the impact of the fusion strategy on individual entities could be largely ignored. To solve this challenge, we propose AMF2SEA, an adaptive multi-modal feature fusion strategy for entity alignment, which dynamically selects the optimal entity-level feature fusion strategy. Additionally, we build a new dataset based on DBP15K, which includes a full set of entity images from multiple inconsistent web sources, making it more representative of the real world. Experimental results demonstrate that our model achieves state-of-the-art (SOTA) performance compared to models using the same modality on DBP15K and its variants with richer image sources and styles. Our code and data are available at https://github.com/ChenxiaoLiJoe/AMFFSEA.