Modgenix at SemEval-2025 Task 1: Context Aware Vision Language Ranking (CAViLR) for Multimodal Idiomaticity Understanding

Joydeb Mondal, Pramir Sarkar


Abstract
This paper presents CAViLR, a hybrid multimodal approach for SemEval-2025 Task 1. Our methodintegrates CLIP as a baseline with a Mixture of Experts (MoE) framework that dynamically selectsexpert models such as Pixtral-12B and Phi-3.5 based on input context. The approach addresseschallenges in both image ranking and image sequence prediction, improving the alignment of visualand textual semantics. Experimental results demonstrate that our hybrid model outperforms individualmodels. Future work will focus on refining expert selection and enhancing disambiguation strategiesfor complex idiomatic expressions.
Anthology ID:
2025.semeval-1.106
Volume:
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Sara Rosenthal, Aiala Rosá, Debanjan Ghosh, Marcos Zampieri
Venues:
SemEval | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
780–784
Language:
URL:
https://preview.aclanthology.org/transition-to-people-yaml/2025.semeval-1.106/
DOI:
Bibkey:
Cite (ACL):
Joydeb Mondal and Pramir Sarkar. 2025. Modgenix at SemEval-2025 Task 1: Context Aware Vision Language Ranking (CAViLR) for Multimodal Idiomaticity Understanding. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 780–784, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Modgenix at SemEval-2025 Task 1: Context Aware Vision Language Ranking (CAViLR) for Multimodal Idiomaticity Understanding (Mondal & Sarkar, SemEval 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/transition-to-people-yaml/2025.semeval-1.106.pdf