A Corpus for Visual Question Answering Annotated with Frame Semantic Information

Mehrdad Alizadeh; Barbara Di Eugenio

A Corpus for Visual Question Answering Annotated with Frame Semantic Information

Abstract

Visual Question Answering (VQA) has been widely explored as a computer vision problem, however enhancing VQA systems with linguistic information is necessary for tackling the complexity of the task. The language understanding part can play a major role especially for questions asking about events or actions expressed via verbs. We hypothesize that if the question focuses on events described by verbs, then the model should be aware of or trained with verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. We created a new VQA dataset annotated with verb semantic information called imSituVQA. imSituVQA is built by taking advantage of the imSitu dataset annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet.

Anthology ID:: 2020.lrec-1.678
Volume:: Proceedings of the 12th Language Resources and Evaluation Conference
Month:: May
Year:: 2020
Address:: Marseille, France
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 5524–5531
Language:: English
URL:: https://aclanthology.org/2020.lrec-1.678
DOI:
Bibkey:
Cite (ACL):: Mehrdad Alizadeh and Barbara Di Eugenio. 2020. A Corpus for Visual Question Answering Annotated with Frame Semantic Information. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 5524–5531, Marseille, France. European Language Resources Association.
Cite (Informal):: A Corpus for Visual Question Answering Annotated with Frame Semantic Information (Alizadeh & Di Eugenio, LREC 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/update-css-js/2020.lrec-1.678.pdf

PDF Cite Search