Knowledge Injected Prompt Based Fine-tuning for Multi-label Few-shot ICD Coding

Zhichao Yang, Shufan Wang, Bhanu Pratap Singh Rawat, Avijit Mitra, Hong Yu


Abstract
Automatic International Classification of Diseases (ICD) coding aims to assign multiple ICD codes to a medical note with average length of 3,000+ tokens. This task is challenging due to a high-dimensional space of multi-label assignment (tens of thousands of ICD codes) and the long-tail challenge: only a few codes (common diseases) are frequently assigned while most codes (rare diseases) are infrequently assigned. This study addresses the long-tail challenge by adapting a prompt-based fine-tuning technique with label semantics, which has been shown to be effective under few-shot setting. To further enhance the performance in medical domain, we propose a knowledge-enhanced longformer by injecting three domain-specific knowledge: hierarchy, synonym, and abbreviation with additional pretraining using contrastive learning. Experiments on MIMIC-III-full, a benchmark dataset of code assignment, show that our proposed method outperforms previous state-of-the-art method in 14.5% in marco F1 (from 10.3 to 11.8, P<0.001). To further test our model on few-shot setting, we created a new rare diseases coding dataset, MIMIC-III-rare50, on which our model improves marco F1 from 17.1 to 30.4 and micro F1 from 17.2 to 32.6 compared to previous method.
Anthology ID:
2022.findings-emnlp.127
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1767–1781
Language:
URL:
https://aclanthology.org/2022.findings-emnlp.127
DOI:
Bibkey:
Cite (ACL):
Zhichao Yang, Shufan Wang, Bhanu Pratap Singh Rawat, Avijit Mitra, and Hong Yu. 2022. Knowledge Injected Prompt Based Fine-tuning for Multi-label Few-shot ICD Coding. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 1767–1781, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Knowledge Injected Prompt Based Fine-tuning for Multi-label Few-shot ICD Coding (Yang et al., Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.findings-emnlp.127.pdf