daalft at SemEval-2025 Task 1: Multi-step Zero-shot Multimodal Idiomaticity Ranking

David Alfter


Abstract
This paper presents a multi-step zero-shot system for SemEval-2025 Task 1 on Advancing Multimodal Idiomaticity Representation (AdMIRe). The system employs two state-of-the-art multimodal language models, Claude Sonnet 3.5 and OpenAI GPT-4o, to determine idiomaticity and rank images for relevance in both subtasks. A hybrid approach combining o1-preview for idiomaticity classification and GPT-4o for visual ranking produced the best overall results. The system demonstrates competitive performance on the English extended dataset for Subtask A, but faces challenges in cross-lingual transfer to Portuguese. Comparing Image+Text and Text-Only approaches reveals interesting trends and raises questions about the role of visual information in multimodal idiomaticity detection.
Anthology ID:
2025.semeval-1.19
Volume:
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Sara Rosenthal, Aiala Rosá, Debanjan Ghosh, Marcos Zampieri
Venues:
SemEval | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
127–140
Language:
URL:
https://preview.aclanthology.org/corrections-2025-08/2025.semeval-1.19/
DOI:
Bibkey:
Cite (ACL):
David Alfter. 2025. daalft at SemEval-2025 Task 1: Multi-step Zero-shot Multimodal Idiomaticity Ranking. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 127–140, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
daalft at SemEval-2025 Task 1: Multi-step Zero-shot Multimodal Idiomaticity Ranking (Alfter, SemEval 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-08/2025.semeval-1.19.pdf