What’s Different between Visual Question Answering for Machine “Understanding” Versus for Accessibility?

Yang (Trista) Cao; Kyle Seelman; Kyungjun Lee; Hal Daumé III

doi:10.18653/v1/2022.aacl-main.75

What’s Different between Visual Question Answering for Machine “Understanding” Versus for Accessibility?

Yang Trista Cao, Kyle Seelman, Kyungjun Lee, Hal Daumé III

Abstract

In visual question answering (VQA), a machine must answer a question given an associated image. Recently, accessibility researchers have explored whether VQA can be deployed in a real-world setting where users with visual impairments learn about their environment by capturing their visual surroundings and asking questions. However, most of the existing benchmarking datasets for VQA focus on machine “understanding” and it remains unclear how progress on those datasets corresponds to improvements in this real-world use case. We aim to answer this question by evaluating discrepancies between machine “understanding” datasets (VQA-v2) and accessibility datasets (VizWiz) by evaluating a variety of VQA models. Based on our findings, we discuss opportunities and challenges in VQA for accessibility and suggest directions for future work.

Anthology ID:: 2022.aacl-main.75
Volume:: Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:: November
Year:: 2022
Address:: Online only
Editors:: Yulan He, Heng Ji, Sujian Li, Yang Liu, Chua-Hui Chang
Venues:: AACL | IJCNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1025–1034
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.aacl-main.75/
DOI:: 10.18653/v1/2022.aacl-main.75
Bibkey:
Cite (ACL):: Yang Trista Cao, Kyle Seelman, Kyungjun Lee, and Hal Daumé III. 2022. What’s Different between Visual Question Answering for Machine “Understanding” Versus for Accessibility?. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1025–1034, Online only. Association for Computational Linguistics.
Cite (Informal):: What’s Different between Visual Question Answering for Machine “Understanding” Versus for Accessibility? (Cao et al., AACL-IJCNLP 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.aacl-main.75.pdf

PDF Cite Search Fix data