PMCoders at SemEval-2023 Task 1: RAltCLIP: Use Relative AltCLIP Features to Rank

Mohammad Javad Pirhadi, Motahhare Mirzaei, Mohammad Reza Mohammadi, Sauleh Eetemadi


Abstract
Visual Word Sense Disambiguation (VWSD) task aims to find the most related image among 10 images to an ambiguous word in some limited textual context. In this work, we use AltCLIP features and a 3-layer standard transformer encoder to compare the cosine similarity between the given phrase and different images. Also, we improve our model’s generalization by using a subset of LAION-5B. The best official baseline achieves 37.20% and 54.39% macro-averaged hit rate and MRR (Mean Reciprocal Rank) respectively. Our best configuration reaches 39.61% and 56.78% macro-averaged hit rate and MRR respectively. The code will be made publicly available on GitHub.
Anthology ID:
2023.semeval-1.242
Volume:
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Giovanni Da San Martino, Harish Tayyar Madabushi, Ritesh Kumar, Elisa Sartori
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
1751–1755
Language:
URL:
https://preview.aclanthology.org/icon-24-ingestion/2023.semeval-1.242/
DOI:
10.18653/v1/2023.semeval-1.242
Bibkey:
Cite (ACL):
Mohammad Javad Pirhadi, Motahhare Mirzaei, Mohammad Reza Mohammadi, and Sauleh Eetemadi. 2023. PMCoders at SemEval-2023 Task 1: RAltCLIP: Use Relative AltCLIP Features to Rank. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 1751–1755, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
PMCoders at SemEval-2023 Task 1: RAltCLIP: Use Relative AltCLIP Features to Rank (Pirhadi et al., SemEval 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/icon-24-ingestion/2023.semeval-1.242.pdf