Guergana K Savova


2025

pdf bib
Aspect-Oriented Summarization for Psychiatric Short-Term Readmission Prediction
WonJin Yoon | Boyu Ren | Spencer Thomas | Chanhwi Kim | Guergana K Savova | Mei-Hua Hall | Timothy A. Miller
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Recent progress in large language models (LLMs) has enabled the automated processing of lengthy documents even without supervised training on a task-specific dataset. Yet, their zero-shot performance in complex tasks as opposed to straightforward information extraction tasks remains suboptimal. One feasible approach for tasks with lengthy, complex input is to first summarize the document and then apply supervised fine-tuning to the summary. However, the summarization process inevitably results in some loss of information. In this study we present a method for processing the summaries of long documents aimed to capture different important aspects of the original document. We hypothesize that LLM summaries generated with different aspect-oriented prompts contain different information signals, and we propose methods to measure these differences. We introduce approaches to effectively integrate signals from these different summaries for supervised training of transformer models. We validate our hypotheses on a high-impact task – 30-day readmission prediction from a psychiatric discharge – using real-world data from four hospitals, and show that our proposed method increases the prediction performance for the complex task of predicting patient outcome.

pdf bib
WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation
João Matos | Shan Chen | Siena Kathleen V. Placino | Yingya Li | Juan Carlos Climent Pardo | Daphna Idan | Takeshi Tohyama | David Restrepo | Luis Filipe Nakayama | José María Millet Pascual-Leone | Guergana K Savova | Hugo Aerts | Leo Anthony Celi | An-Kwok Ian Wong | Danielle Bitterman | Jack Gallifant
Findings of the Association for Computational Linguistics: NAACL 2025

Multimodal/vision language models (VLMs) are increasingly being deployed in healthcare settings worldwide, necessitating robust benchmarks to ensure their safety, efficacy, and fairness. Multiple-choice question and answer (QA) datasets derived from national medical examinations have long served as valuable evaluation tools, but existing datasets are largely text-only and available in a limited subset of languages and countries. To address these challenges, we present WorldMedQA-V, an updated multilingual, multimodal benchmarking dataset designed to evaluate VLMs in healthcare. WorldMedQA-V includes 568 labeled multiple-choice QAs paired with 568 medical images from four countries (Brazil, Israel, Japan, and Spain), covering original languages and validated English translations by native clinicians, respectively. Baseline performance for common open- and closed-source models are provided in the local language and English translations, and with and without images provided to the model. The WorldMedQA-V benchmark aims to better match AI systems to the diverse healthcare environments in which they are deployed, fostering more equitable, effective, and representative applications.