Xin Bai


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
On the Human-level Performance of Visual Question Answering
Chenlian Zhou | Guanyi Chen | Xin Bai | Ming Dong
Proceedings of the 31st International Conference on Computational Linguistics

Visual7W has been widely used in assessing multiple-choice visual question-answering (VQA) systems. This paper reports on a replicated human experiment on Visual7W with the aim of understanding the human-level performance of VQA. The replication was not entirely successful because human participants performed significantly worse when answering “where”, “when”, and “how” questions in compared to other question types. An error analysis discovered that the failure was a consequence of the non-deterministic distractors in Visual7W. GPT-4V was then evaluated using and was compared to the human-level performance. The results embody that, when evaluating models’ capacity on Visual7W, the performance is not necessarily the higher, the better.