Leveraging Prompt-Learning for Structured Information Extraction from Crohn’s Disease Radiology Reports in a Low-Resource Language

Liam Hazan, Naama Gavrielov, Roi Reichart, Talar Hagopian, Mary-Louise Greer, Ruth Cytter-Kuint, Gili Focht, Dan Turner, Moti Freiman


Abstract
Automatic conversion of free-text radiology reports into structured data using Natural Language Processing (NLP) techniques is crucial for analyzing diseases on a large scale. While effective for tasks in widely spoken languages like English, generative large language models (LLMs) typically underperform with less common languages and can pose potential risks to patient privacy. Fine-tuning local NLP models is hindered by the skewed nature of real-world medical datasets, where rare findings represent a significant data imbalance. We introduce SMP-BERT, a novel prompt learning method that leverages the structured nature of reports to overcome these challenges. In our studies involving a substantial collection of Crohn’s disease radiology reports in Hebrew (over 8,000 patients and 10,000 reports), SMP-BERT greatly surpassed traditional fine-tuning methods in performance, notably in detecting infrequent conditions (AUC: 0.99 vs 0.94, F1: 0.84 vs 0.34). SMP-BERT empowers more accurate AI diagnostics available for low-resource languages.
Anthology ID:
2024.clinicalnlp-1.26
Original:
2024.clinicalnlp-1.26v1
Version 2:
2024.clinicalnlp-1.26v2
Volume:
Proceedings of the 6th Clinical Natural Language Processing Workshop
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Tristan Naumann, Asma Ben Abacha, Steven Bethard, Kirk Roberts, Danielle Bitterman
Venues:
ClinicalNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
301–309
Language:
URL:
https://aclanthology.org/2024.clinicalnlp-1.26
DOI:
10.18653/v1/2024.clinicalnlp-1.26
Bibkey:
Cite (ACL):
Liam Hazan, Naama Gavrielov, Roi Reichart, Talar Hagopian, Mary-Louise Greer, Ruth Cytter-Kuint, Gili Focht, Dan Turner, and Moti Freiman. 2024. Leveraging Prompt-Learning for Structured Information Extraction from Crohn’s Disease Radiology Reports in a Low-Resource Language. In Proceedings of the 6th Clinical Natural Language Processing Workshop, pages 301–309, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Leveraging Prompt-Learning for Structured Information Extraction from Crohn’s Disease Radiology Reports in a Low-Resource Language (Hazan et al., ClinicalNLP-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.clinicalnlp-1.26.pdf