Rethinking Multimodal Entity and Relation Extraction from a Translation Point of View

Changmeng Zheng, Junhao Feng, Yi Cai, Xiaoyong Wei, Qing Li


Abstract
We revisit the multimodal entity and relation extraction from a translation point of view. Special attention is paid on the misalignment issue in text-image datasets which may mislead the learning. We are motivated by the fact that the cross-modal misalignment is a similar problem of cross-lingual divergence issue in machine translation. The problem can then be transformed and existing solutions can be borrowed by treating a text and its paired image as the translation to each other. We implement a multimodal back-translation using diffusion-based generative models for pseudo-paralleled pairs and a divergence estimator by constructing a high-resource corpora as a bridge for low-resource learners. Fine-grained confidence scores are generated to indicate both types and degrees of alignments with which better representations are obtained. The method has been validated in the experiments by outperforming 14 state-of-the-art methods in both entity and relation extraction tasks. The source code is available at https://github.com/thecharm/TMR.
Anthology ID:
2023.acl-long.376
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6810–6824
Language:
URL:
https://aclanthology.org/2023.acl-long.376
DOI:
10.18653/v1/2023.acl-long.376
Bibkey:
Cite (ACL):
Changmeng Zheng, Junhao Feng, Yi Cai, Xiaoyong Wei, and Qing Li. 2023. Rethinking Multimodal Entity and Relation Extraction from a Translation Point of View. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6810–6824, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Rethinking Multimodal Entity and Relation Extraction from a Translation Point of View (Zheng et al., ACL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2023.acl-long.376.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-5/2023.acl-long.376.mp4