Towards Visual Question Answering on Pathology Images

Xuehai He, Zhuo Cai, Wenlan Wei, Yichen Zhang, Luntian Mou, Eric Xing, Pengtao Xie


Abstract
Pathology imaging is broadly used for identifying the causes and effects of diseases or injuries. Given a pathology image, being able to answer questions about the clinical findings contained in the image is very important for medical decision making. In this paper, we aim to develop a pathological visual question answering framework to analyze pathology images and answer medical questions related to these images. To build such a framework, we create PathVQA, a VQA dataset with 32,795 questions asked from 4,998 pathology images. We also propose a three-level optimization framework which performs self-supervised pretraining and VQA finetuning end-to-end to learn powerful visual and textual representations jointly and automatically identifies and excludes noisy self-supervised examples from pretraining. We perform experiments on our created PathVQA dataset and the results demonstrate the effectiveness of our proposed methods. The datasets and code are available at https://github.com/UCSD-AI4H/PathVQA
Anthology ID:
2021.acl-short.90
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Month:
August
Year:
2021
Address:
Online
Editors:
Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
708–718
Language:
URL:
https://aclanthology.org/2021.acl-short.90
DOI:
10.18653/v1/2021.acl-short.90
Bibkey:
Cite (ACL):
Xuehai He, Zhuo Cai, Wenlan Wei, Yichen Zhang, Luntian Mou, Eric Xing, and Pengtao Xie. 2021. Towards Visual Question Answering on Pathology Images. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 708–718, Online. Association for Computational Linguistics.
Cite (Informal):
Towards Visual Question Answering on Pathology Images (He et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2021.acl-short.90.pdf
Video:
 https://preview.aclanthology.org/naacl-24-ws-corrections/2021.acl-short.90.mp4
Data
VQA-RADVisual Question Answering