Medical Visual Textual Entailment for Numerical Understanding of Vision-and-Language Models

Hitomi Yanaka; Yuta Nakamura; Yuki Chida; Tomoya Kurosawa

doi:10.18653/v1/2023.clinicalnlp-1.2

Medical Visual Textual Entailment for Numerical Understanding of Vision-and-Language Models

Hitomi Yanaka, Yuta Nakamura, Yuki Chida, Tomoya Kurosawa

Abstract

Assessing the capacity of numerical understanding of vision-and-language models over images and texts is crucial for real vision-and-language applications, such as systems for automated medical image analysis. We provide a visual reasoning dataset focusing on numerical understanding in the medical domain. The experiments using our dataset show that current vision-and-language models fail to perform numerical inference in the medical domain. However, the data augmentation with only a small amount of our dataset improves the model performance, while maintaining the performance in the general domain.

Anthology ID:: 2023.clinicalnlp-1.2
Volume:: Proceedings of the 5th Clinical Natural Language Processing Workshop
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Tristan Naumann, Asma Ben Abacha, Steven Bethard, Kirk Roberts, Anna Rumshisky
Venue:: ClinicalNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8–18
Language:
URL:: https://aclanthology.org/2023.clinicalnlp-1.2
DOI:: 10.18653/v1/2023.clinicalnlp-1.2
Bibkey:
Cite (ACL):: Hitomi Yanaka, Yuta Nakamura, Yuki Chida, and Tomoya Kurosawa. 2023. Medical Visual Textual Entailment for Numerical Understanding of Vision-and-Language Models. In Proceedings of the 5th Clinical Natural Language Processing Workshop, pages 8–18, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Medical Visual Textual Entailment for Numerical Understanding of Vision-and-Language Models (Yanaka et al., ClinicalNLP 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-2023-videos/2023.clinicalnlp-1.2.pdf

PDF Cite Search