Francesco Dalla Serra


2023

pdf
Controllable Chest X-Ray Report Generation from Longitudinal Representations
Francesco Dalla Serra | Chaoyang Wang | Fani Deligianni | Jeff Dalton | Alison O’Neil
Findings of the Association for Computational Linguistics: EMNLP 2023

Radiology reports are detailed text descriptions of the content of medical scans. Each report describes the presence/absence and location of relevant clinical findings, commonly including comparison with prior exams of the same patient to describe how they evolved. Radiology reporting is a time-consuming process, and scan results are often subject to delays. One strategy to speed up reporting is to integrate automated reporting systems, however clinical deployment requires high accuracy and interpretability. Previous approaches to automated radiology reporting generally do not provide the prior study as input, precluding comparison which is required for clinical accuracy in some types of scans, and offer only unreliable methods of interpretability. Therefore, leveraging an existing visual input format of anatomical tokens, we introduce two novel aspects: (1) longitudinal representation learning – we input the prior scan as an additional input, proposing a method to align, concatenate and fuse the current and prior visual information into a joint longitudinal representation which can be provided to the multimodal report generation model; (2) sentence-anatomy dropout – a training strategy for controllability in which the report generator model is trained to predict only sentences from the original report which correspond to the subset of anatomical regions given as input. We show through in-depth experiments on the MIMIC-CXR dataset how the proposed approach achieves state-of-the-art results while enabling anatomy-wise controllable report generation.

2022

pdf
Multimodal Generation of Radiology Reports using Knowledge-Grounded Extraction of Entities and Relations
Francesco Dalla Serra | William Clackett | Hamish MacKinnon | Chaoyang Wang | Fani Deligianni | Jeff Dalton | Alison Q. O’Neil
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Automated reporting has the potential to assist radiologists with the time-consuming procedure of generating text radiology reports. Most existing approaches generate the report directly from the radiology image, however we observe that the resulting reports exhibit realistic style but lack clinical accuracy. Therefore, we propose a two-step pipeline that subdivides the problem into factual triple extraction followed by free-text report generation. The first step comprises supervised extraction of clinically relevant structured information from the image, expressed as triples of the form (entity1, relation, entity2). In the second step, these triples are input to condition the generation of the radiology report. In particular, we focus our work on Chest X-Ray (CXR) radiology report generation. The proposed framework shows state-of-the-art results on the MIMIC-CXR dataset according to most of the standard text generation metrics that we employ (BLEU, METEOR, ROUGE) and to clinical accuracy metrics (recall, precision and F1 assessed using the CheXpert labeler), also giving a 23% reduction in the total number of errors and a 29% reduction in critical clinical errors as assessed by expert human evaluation. In future, this solution can easily integrate more advanced model architectures - to both improve the triple extraction and the report generation - and can be applied to other complex image captioning tasks, such as those found in the medical domain.