Knowledge Injected Prompt Based Fine-tuning for Multi-label Few-shot ICD Coding

Zhichao Yang; Shufan Wang; Bhanu Pratap Singh Rawat; Avijit Mitra; Hong Yu

Knowledge Injected Prompt Based Fine-tuning for Multi-label Few-shot ICD Coding

Zhichao Yang, Shufan Wang, Bhanu Pratap Singh Rawat, Avijit Mitra, Hong Yu

Abstract

Automatic International Classification of Diseases (ICD) coding aims to assign multiple ICD codes to a medical note with average length of 3,000+ tokens. This task is challenging due to a high-dimensional space of multi-label assignment (tens of thousands of ICD codes) and the long-tail challenge: only a few codes (common diseases) are frequently assigned while most codes (rare diseases) are infrequently assigned. This study addresses the long-tail challenge by adapting a prompt-based fine-tuning technique with label semantics, which has been shown to be effective under few-shot setting. To further enhance the performance in medical domain, we propose a knowledge-enhanced longformer by injecting three domain-specific knowledge: hierarchy, synonym, and abbreviation with additional pretraining using contrastive learning. Experiments on MIMIC-III-full, a benchmark dataset of code assignment, show that our proposed method outperforms previous state-of-the-art method in 14.5% in marco F1 (from 10.3 to 11.8, P<0.001). To further test our model on few-shot setting, we created a new rare diseases coding dataset, MIMIC-III-rare50, on which our model improves marco F1 from 17.1 to 30.4 and micro F1 from 17.2 to 32.6 compared to previous method.

Anthology ID:: 2022.findings-emnlp.127
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2022
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1767–1781
Language:
URL:: https://aclanthology.org/2022.findings-emnlp.127
DOI:
Bibkey:
Cite (ACL):: Zhichao Yang, Shufan Wang, Bhanu Pratap Singh Rawat, Avijit Mitra, and Hong Yu. 2022. Knowledge Injected Prompt Based Fine-tuning for Multi-label Few-shot ICD Coding. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 1767–1781, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Knowledge Injected Prompt Based Fine-tuning for Multi-label Few-shot ICD Coding (Yang et al., Findings 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-script-update/2022.findings-emnlp.127.pdf

PDF Search