Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration

Jeremy Qin, Bang Liu, Quoc Dinh Nguyen


Abstract
Black-box large language models (LLMs) are increasingly deployed in various environments, making it essential for these models to effectively convey their confidence and uncertainty, especially in high-stakes settings. However, these models often exhibit overconfidence, leading to potential risks and misjudgments. Existing techniques for eliciting and calibrating LLM confidence have primarily focused on general reasoning datasets, yielding only modest improvements. Accurate calibration is crucial for informed decision-making and preventing adverse outcomes but remains challenging due to the complexity and variability of tasks these models perform. In this work, we investigate the miscalibration behavior of black-box LLMs within the healthcare setting. We propose a novel method, Atypical Presentations Recalibration, which leverages atypical presentations to adjust the model’s confidence estimates. Our approach significantly improves calibration, reducing calibration errors by approximately 60% on three medical question answering datasets and outperforming existing methods such as vanilla verbalized confidence, CoT verbalized confidence and others. Additionally, we provide an in-depth analysis of the role of atypicality within the recalibration framework.
Anthology ID:
2024.findings-emnlp.142
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2520–2537
Language:
URL:
https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.142/
DOI:
10.18653/v1/2024.findings-emnlp.142
Bibkey:
Cite (ACL):
Jeremy Qin, Bang Liu, and Quoc Dinh Nguyen. 2024. Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 2520–2537, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration (Qin et al., Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.142.pdf