VisAffect at MWE-2026 AdMIRe 2: IMMCAN Idiom Multimodal Cross-Attention Network

Barış Bilen; Ali Azmoudeh; Hazım Kemal Ekenel; Hatice Kose

VisAffect at MWE-2026 AdMIRe 2: IMMCAN Idiom Multimodal Cross-Attention Network

Barış Bilen, Ali Azmoudeh, Hazım Kemal Ekenel, Hatice Kose

Abstract

We address AdMIRe 2.0, a static image ranking task where a sentence containing a potentially idiomatic expression is paired with five image–caption candidates, and the goal is to rank the candidates by semantic compatibility with the intended idiomatic or literal meaning. We propose IMMCAN, which keeps XLM-R and Jina-CLIP-v2 frozen and learns a lightweight two-stage cross-attention fusion, caption–image grounding followed by idiom-to-multimodal conditioning, to predict a compatibility score per candidate. We also evaluate caption-only augmentation via back-translation and synonym substitution, and compare regression and rank-class formulations. On AdMIRe 1.0, text-only achieves higher test top-image accuracy than VLM-grounded modeling. In contrast, on AdMIRe 2.0 zero-shot, adding visual patch grounding improves both accuracy and NDCG indicating better cross-lingual ranking transfer.

Anthology ID:: 2026.mwe-1.19
Volume:: Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
Month:: March
Year:: 2026
Address:: Rabat, Marocco
Editors:: Atul Kr. Ojha, Verginica Barbu Mititelu, Mathieu Constant, Ivelina Stoyanova, A. Seza Doğruöz, Alexandre Rademaker
Venues:: MWE | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 149–153
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.mwe-1.19/
DOI:
Bibkey:
Cite (ACL):: Barış Bilen, Ali Azmoudeh, Hazım Kemal Ekenel, and Hatice Kose. 2026. VisAffect at MWE-2026 AdMIRe 2: IMMCAN Idiom Multimodal Cross-Attention Network. In Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026), pages 149–153, Rabat, Marocco. Association for Computational Linguistics.
Cite (Informal):: VisAffect at MWE-2026 AdMIRe 2: IMMCAN Idiom Multimodal Cross-Attention Network (Bilen et al., MWE 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.mwe-1.19.pdf

PDF Cite Search Fix data