Maira Khatoon


2025

Idiomatic expressions pose difficulties for Natural Language Processing (NLP) because they are noncompositional. In this paper, we propose the Idiom Visual Understanding Dataset (IVUD), a multimodal dataset for idiom understanding using visual and textual representation. For SemEval-2025 Task 1 (AdMIRe), we specifically addressed dataset augmentation using AI-synthesized images and human-directed prompt engineering. We compared the efficacy of vision- and text-based models in ranking images aligned with idiomatic phrases. The results identify the advantages of using multimodal context for enhanced idiom understanding, showcasing how vision-language models perform better than text-only approaches in the detection of idiomaticity.