Abstract
The increase in the availability of online videos has transformed the way we access information and knowledge. A growing number of individuals now prefer instructional videos as they offer a series of step-by-step procedures to accomplish particular tasks. Instructional videos from the medical domain may provide the best possible visual answers to first aid, medical emergency, and medical education questions. This paper focuses on answering health-related questions asked by health consumers by providing visual answers from medical videos. The scarcity of large-scale datasets in the medical domain is a key challenge that hinders the development of applications that can help the public with their health-related questions. To address this issue, we first proposed a pipelined approach to create two large-scale datasets: HealthVidQA-CRF and HealthVidQA-Prompt. Leveraging the datasets, we developed monomodal and multimodal approaches that can effectively provide visual answers from medical videos to natural language questions. We conducted a comprehensive analysis of the results and outlined the findings, focusing on the impact of the created datasets on model training and the significance of visual features in enhancing the performance of the monomodal and multi-modal approaches for medical visual answer localization task.- Anthology ID:
- 2024.lrec-main.1425
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 16399–16411
- Language:
- URL:
- https://aclanthology.org/2024.lrec-main.1425
- DOI:
- Cite (ACL):
- Deepak Gupta, Kush Attal, and Dina Demner-Fushman. 2024. Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 16399–16411, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches (Gupta et al., LREC-COLING 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.lrec-main.1425.pdf