Vision Meets Definitions: Unsupervised Visual Word Sense Disambiguation Incorporating Gloss Information

Sunjae Kwon; Rishabh Garodia; Minhwa Lee; Zhichao Yang; Hong Yu

doi:10.18653/v1/2023.acl-long.88

Vision Meets Definitions: Unsupervised Visual Word Sense Disambiguation Incorporating Gloss Information

Sunjae Kwon, Rishabh Garodia, Minhwa Lee, Zhichao Yang, Hong Yu

Abstract

Visual Word Sense Disambiguation (VWSD) is a task to find the image that most accurately depicts the correct sense of the target word for the given context. Previously, image-text matching models often suffered from recognizing polysemous words. This paper introduces an unsupervised VWSD approach that uses gloss information of an external lexical knowledge-base, especially the sense definitions. Specifically, we suggest employing Bayesian inference to incorporate the sense definitions when sense information of the answer is not provided. In addition, to ameliorate the out-of-dictionary (OOD) issue, we propose a context-aware definition generation with GPT-3. Experimental results show that the VWSD performance significantly increased with our Bayesian inference-based approach. In addition, our context-aware definition generation achieved prominent performance improvement in OOD examples exhibiting better performance than the existing definition generation method.

Anthology ID:: 2023.acl-long.88
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1583–1598
Language:
URL:: https://aclanthology.org/2023.acl-long.88
DOI:: 10.18653/v1/2023.acl-long.88
Bibkey:
Cite (ACL):: Sunjae Kwon, Rishabh Garodia, Minhwa Lee, Zhichao Yang, and Hong Yu. 2023. Vision Meets Definitions: Unsupervised Visual Word Sense Disambiguation Incorporating Gloss Information. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1583–1598, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Vision Meets Definitions: Unsupervised Visual Word Sense Disambiguation Incorporating Gloss Information (Kwon et al., ACL 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-4/2023.acl-long.88.pdf
Video:: https://preview.aclanthology.org/nschneid-patch-4/2023.acl-long.88.mp4

PDF Search Video