Abstract
The International Classification of Diseases (ICD) serves as a definitive medical classification system encompassing a wide range of diseases and conditions. The primary objective of ICD indexing is to allocate a subset of ICD codes to a medical record, which facilitates standardized documentation and management of various health conditions. Most existing approaches have suffered from selecting the proper label subsets from an extremely large ICD collection with a heavy long-tailed label distribution. In this paper, we leverage a multi-stage “retrieve and re-rank” framework as a novel solution to ICD indexing, via a hybrid discrete retrieval method, and re-rank retrieved candidates with contrastive learning that allows the model to make more accurate predictions from a simplified label space. The retrieval model is a hybrid of auxiliary knowledge of the electronic health records (EHR) and a discrete retrieval method (BM25), which efficiently collects high-quality candidates. In the last stage, we propose a label co-occurrence guided contrastive re-ranking model, which re-ranks the candidate labels by pulling together the clinical notes with positive ICD codes. Experimental results show the proposed method achieves state-of-the-art performance on a number of measures on the MIMIC-III benchmark.- Anthology ID:
- 2024.naacl-long.273
- Volume:
- Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Kevin Duh, Helena Gomez, Steven Bethard
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4881–4891
- Language:
- URL:
- https://aclanthology.org/2024.naacl-long.273
- DOI:
- 10.18653/v1/2024.naacl-long.273
- Cite (ACL):
- Xindi Wang, Robert Mercer, and Frank Rudzicz. 2024. Multi-stage Retrieve and Re-rank Model for Automatic Medical Coding Recommendation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4881–4891, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Multi-stage Retrieve and Re-rank Model for Automatic Medical Coding Recommendation (Wang et al., NAACL 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.naacl-long.273.pdf