Do Large Language Models Know When Not to Answer in Medical QA?

Sravanthi Machcha; Sushrita Yerra; Sharmin Sultana; Hong Yu; Zonghai Yao

Do Large Language Models Know When Not to Answer in Medical QA?

Sravanthi Machcha, Sushrita Yerra, Sharmin Sultana, Hong Yu, Zonghai Yao

Abstract

Uncertainty awareness is essential for large language models (LLMs), particularly in safety-critical domains such as medicine where erroneous or hallucinatory outputs can cause harm. Yet most evaluations remain centered on accuracy, offering limited insight into model confidence and its relation to abstention. In this work, we present preliminary experiments that combine conformal prediction with abstention-augmented and perturbed variants of medical QA datasets. Our early results suggest a positive link between uncertainty estimates and abstention decisions, with this effect amplified under higher difficulty and adversarial perturbations. These findings highlight abstention as a practical handle for probing model reliability in medical QA.

Anthology ID:: 2025.uncertainlp-main.4
Volume:: Proceedings of the 2nd Workshop on Uncertainty-Aware NLP (UncertaiNLP 2025)
Month:: November
Year:: 2025
Address:: Suzhou, China
Editor:: Noidea Noidea
Venues:: UncertaiNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 27–35
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.uncertainlp-main.4/
DOI:
Bibkey:
Cite (ACL):: Sravanthi Machcha, Sushrita Yerra, Sharmin Sultana, Hong Yu, and Zonghai Yao. 2025. Do Large Language Models Know When Not to Answer in Medical QA?. In Proceedings of the 2nd Workshop on Uncertainty-Aware NLP (UncertaiNLP 2025), pages 27–35, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Do Large Language Models Know When Not to Answer in Medical QA? (Machcha et al., UncertaiNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.uncertainlp-main.4.pdf

PDF Cite Search Fix data