Moti Freiman

2026

Reliable extraction of structured information from radiology reports using Large Language Models (LLMs) remains a significant challenge, particularly for complex, non-English texts such as Hebrew. This study proposes an agent-based, uncertainty-aware framework to enhance the reliability and interpretability of LLM predictions in clinical contexts. A total of 9,683 Hebrew radiology reports from Crohn’s disease patients (2010?2023) across three medical centers were analyzed. Of these, 512 reports were manually annotated for six gastrointestinal organs and 15 pathological findings, while the remainder were automatically labeled using HSMP-BERT. Structured data extraction was performed with Llama 3.1 (Llama 3-8b-instruct) employing Bayesian Prompt Ensembles (BayesPE), which utilized six semantically equivalent prompts to quantify uncertainty. An Agent-Based Decision Model aggregated prompt outputs into five calibrated confidence levels and was benchmarked against three entropy-based approaches. Model performance was assessed using accuracy, F1 score, precision, recall, and Cohen’s Kappa before and after filtering high-uncertainty cases. The agent-based model outperformed all baselines, achieving an F1 score of 0.3967, recall of 0.6437, and Kappa of 0.3006; after excluding cases with uncertainty = 0.5, the F1 score increased to 0.4787 and Kappa to 0.4258. The proposed framework improves uncertainty calibration and predictive reliability, advancing the safe deployment of LLMs in medical data extraction.

2024

pdf bib abs

Automatic conversion of free-text radiology reports into structured data using Natural Language Processing (NLP) techniques is crucial for analyzing diseases on a large scale. While effective for tasks in widely spoken languages like English, generative large language models (LLMs) typically underperform with less common languages and can pose potential risks to patient privacy. Fine-tuning local NLP models is hindered by the skewed nature of real-world medical datasets, where rare findings represent a significant data imbalance. We introduce SMP-BERT, a novel prompt learning method that leverages the structured nature of reports to overcome these challenges. In our studies involving a substantial collection of Crohn’s disease radiology reports in Hebrew (over 8,000 patients and 10,000 reports), SMP-BERT greatly surpassed traditional fine-tuning methods in performance, notably in detecting infrequent conditions (AUC: 0.99 vs 0.94, F1: 0.84 vs 0.34). SMP-BERT empowers more accurate AI diagnostics available for low-resource languages.

Co-authors

Venues

Fix author