Improving Neural Models for Radiology Report Retrieval with Lexicon-based Automated Annotation

Luyao Shi, Tanveer Syeda-mahmood, Tyler Baldwin


Abstract
Many clinical informatics tasks that are based on electronic health records (EHR) need relevant patient cohorts to be selected based on findings, symptoms and diseases. Frequently, these conditions are described in radiology reports which can be retrieved using information retrieval (IR) methods. The latest of these techniques utilize neural IR models such as BERT trained on clinical text. However, these methods still lack semantic understanding of the underlying clinical conditions as well as ruled out findings, resulting in poor precision during retrieval. In this paper we combine clinical finding detection with supervised query match learning. Specifically, we use lexicon-driven concept detection to detect relevant findings in sentences. These findings are used as queries to train a Sentence-BERT (SBERT) model using triplet loss on matched and unmatched query-sentence pairs. We show that the proposed supervised training task remarkably improves the retrieval performance of SBERT. The trained model generalizes well to unseen queries and reports from different collections.
Anthology ID:
2022.naacl-main.253
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3457–3463
Language:
URL:
https://aclanthology.org/2022.naacl-main.253
DOI:
10.18653/v1/2022.naacl-main.253
Bibkey:
Cite (ACL):
Luyao Shi, Tanveer Syeda-mahmood, and Tyler Baldwin. 2022. Improving Neural Models for Radiology Report Retrieval with Lexicon-based Automated Annotation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3457–3463, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Improving Neural Models for Radiology Report Retrieval with Lexicon-based Automated Annotation (Shi et al., NAACL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2022.naacl-main.253.pdf
Video:
 https://preview.aclanthology.org/emnlp-22-attachments/2022.naacl-main.253.mp4
Data
MS MARCO