dutir914 at SemEval-2025 Task 1: An integrated approach for Multimodal Idiomaticity Representations

Yanan Wang, Dailin Li, Yicen Tian, Bo Zhang, Wang Jian, Liang Yang


Abstract
SemEval-2025 Task 1 introduces multimodal datasets for idiomatic expression representation. Subtask A focuses on ranking images based on potentially idiomatic noun compounds in given sentences. Idiom comprehension demands the fusion of visual and auditory elements with contextual semantics, yet existing datasets exhibit phrase-image discordance and culture-specific opacity, impeding cross-modal semantic alignment. To address these challenges, we propose an integrated approach that combines data augmentation and model fine-tuning in subtask A. First, we construct two idiom datasets by generating visual metaphors for idiomatic expressions to fine-tune the CLIP model. Next, We propose a three-stage multimodal chain-of-thought method, fine-tuning Qwen2.5-VL-7B-Instruct to generate rationales and perform inference, alongside zero-shot experiments with Qwen2.5-VL-72B-Instruct. Finally, we integrate the output of different models through a voting mechanism to enhance the accuracy of multimodal semantic matching. This approach achieves {textbf{0.92}} accuracy on the Portuguese test set and {textbf{0.93}} on the English test set, ranking {textbf{3rd}} and {textbf{4th}}, respectively. The implementation code is publicly available here{footnote{{url{ https://github.com/wyn1015/semeval}}}}.
Anthology ID:
2025.semeval-1.159
Volume:
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Sara Rosenthal, Aiala Rosá, Debanjan Ghosh, Marcos Zampieri
Venues:
SemEval | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1198–1203
Language:
URL:
https://preview.aclanthology.org/transition-to-people-yaml/2025.semeval-1.159/
DOI:
Bibkey:
Cite (ACL):
Yanan Wang, Dailin Li, Yicen Tian, Bo Zhang, Wang Jian, and Liang Yang. 2025. dutir914 at SemEval-2025 Task 1: An integrated approach for Multimodal Idiomaticity Representations. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 1198–1203, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
dutir914 at SemEval-2025 Task 1: An integrated approach for Multimodal Idiomaticity Representations (Wang et al., SemEval 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/transition-to-people-yaml/2025.semeval-1.159.pdf