Ibtihel Amara


2024

pdf
Large Language Models Provide Human-Level Medical Text Snippet Labeling
Ibtihel Amara | Haiyang Yu | Fan Zhang | Yuchen Liu | Benny Li | Chang Liu | Rupesh Kartha | Akshay Goel
Proceedings of the 6th Clinical Natural Language Processing Workshop

This study evaluates the proficiency of Large Language Models (LLMs) in accurately labeling clinical document excerpts. Our focus is on the assignment of potential or confirmed diagnoses and medical procedures to snippets of medical text sourced from unstructured clinical patient records. We explore how the performance of LLMs compare against human annotators in classifying these excerpts. Employing a few-shot, chain-of-thought prompting approach with the MIMIC-III dataset, Med-PaLM 2 showcases annotation accuracy comparable to human annotators, achieving a notable precision rate of approximately 92% relative to the gold standard labels established by human experts.