2025
pdf
bib
abs
Adverse Event Extraction from Discharge Summaries: A New Dataset, Annotation Scheme, and Initial Findings
Imane Guellil
|
Salomé Andres
|
Atul Anand
|
Bruce Guthrie
|
Huayu Zhang
|
Abul Hasan
|
Honghan Wu
|
Beatrice Alex
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In this work, we present a manually annotated corpus for Adverse Event (AE) extraction from discharge summaries of elderly patients, a population often underrepresented in clinical NLP resources. The dataset includes 14 clinically significant AEs—such as falls, delirium, and intracranial haemorrhage, along with contextual attributes like negation, diagnosis type, and in-hospital occurrence. Uniquely, the annotation schema supports both discontinuous and overlapping entities, addressing challenges rarely tackled in prior work. We evaluate multiple models using FlairNLP across three annotation granularities: fine-grained, coarse-grained, and coarse-grained with negation. While transformer-based models (e.g., BERT-cased) achieve strong performance on document-level coarse-grained extraction (F1 = 0.943), performance drops notably for fine-grained entity-level tasks (e.g., F1 = 0.675), particularly for rare events and complex attributes. These results demonstrate that despite high-level scores, significant challenges remain in detecting underrepresented AEs and capturing nuanced clinical language. Developed within a Trusted Research Environment (TRE), the dataset is available upon request via DataLoch and serves as a robust benchmark for evaluating AE extraction methods and supporting future cross-dataset generalisation.
2024
pdf
bib
abs
KnowLab_AIMed at MEDIQA-CORR 2024: Chain-of-Though (CoT) prompting strategies for medical error detection and correction
Zhaolong Wu
|
Abul Hasan
|
Jinge Wu
|
Yunsoo Kim
|
Jason Cheung
|
Teng Zhang
|
Honghan Wu
Proceedings of the 6th Clinical Natural Language Processing Workshop
This paper describes our submission to the MEDIQA-CORR 2024 shared task for automatically detecting and correcting medical errors in clinical notes. We report results for three methods of few-shot In-Context Learning (ICL) augmented with Chain-of-Thought (CoT) and reason prompts using a large language model (LLM). In the first method, we manually analyse a subset of train and validation dataset to infer three CoT prompts by examining error types in the clinical notes. In the second method, we utilise the training dataset to prompt the LLM to deduce reasons about their correctness or incorrectness. The constructed CoTs and reasons are then augmented with ICL examples to solve the tasks of error detection, span identification, and error correction. Finally, we combine the two methods using a rule-based ensemble method. Across the three sub-tasks, our ensemble method achieves a ranking of 3rd for both sub-task 1 and 2, while securing 7th place in sub-task 3 among all submissions.
2023
pdf
bib
abs
KnowLab at RadSum23: comparing pre-trained language models in radiology report summarization
Jinge Wu
|
Daqian Shi
|
Abul Hasan
|
Honghan Wu
Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks
This paper presents our contribution to the RadSum23 shared task organized as part of the BioNLP 2023. We compared state-of-the-art generative language models in generating high-quality summaries from radiology reports. A two-stage fine-tuning approach was introduced for utilizing knowledge learnt from different datasets. We evaluated the performance of our method using a variety of metrics, including BLEU, ROUGE, bertscore, CheXbert, and RadGraph. Our results revealed the potentials of different models in summarizing radiology reports and demonstrated the effectiveness of the two-stage fine-tuning approach. We also discussed the limitations and future directions of our work, highlighting the need for better understanding the architecture design’s effect and optimal way of fine-tuning accordingly in automatic clinical summarizations.