Multimodal Cross-lingual Phrase Retrieval

Chuanqi Dong, Wenjie Zhou, Xiangyu Duan, Yuqi Zhang, Min Zhang


Abstract
Cross-lingual phrase retrieval aims to retrieve parallel phrases among languages. Current approaches only deals with textual modality. There lacks multimodal data resources and explorations for multimodal cross-lingual phrase retrieval (MXPR). In this paper, we create the first MXPR data resource and propose a novel approach for MXPR to explore the effectiveness of multi-modality. The MXPR data resource is built by marrying the benchmark dataset for textual cross-lingual phrase retrieval with Wikimedia Commons, which is a media store containing tremendous texts and related images. In the built resource, the phrase pairs of the textual benchmark dataset are equipped with their related images. Based on this novel data resource, we introduce a strategy to bridge the gap between different modalities by multimodal relation generation with a large multimodal pre-trained model and consistency training. Experiments on benchmarked dataset covering eight language pairs show that our MXPR approach, which deals with multimodal phrases, performs significantly better than pure textual cross-lingual phrase retrieval.
Anthology ID:
2024.lrec-main.1040
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
11917–11927
Language:
URL:
https://aclanthology.org/2024.lrec-main.1040
DOI:
Bibkey:
Cite (ACL):
Chuanqi Dong, Wenjie Zhou, Xiangyu Duan, Yuqi Zhang, and Min Zhang. 2024. Multimodal Cross-lingual Phrase Retrieval. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 11917–11927, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Multimodal Cross-lingual Phrase Retrieval (Dong et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/proper-vol2-ingestion/2024.lrec-main.1040.pdf