ACE-ICD: Acronym Expansion As Data Augmentation For Automated ICD Coding

Tuan-Dung Le, Shohreh Haddadan, Thanh Q. Thieu


Abstract
Automatic ICD coding, the task of assigning disease and procedure codes to electronic medical records, is crucial for clinical documentation and billing. While existing methods primarily enhance model understanding of code hierarchies and synonyms, they often overlook the pervasive use of medical acronyms in clinical notes, a key factor in ICD code inference. To address this gap, we propose a novel effective data augmentation technique that leverages large language models to expand medical acronyms, allowing models to be trained on their full form representations. Moreover, we incorporate consistency training to regularize predictions by enforcing agreement between the original and augmented documents. Extensive experiments on the MIMIC-III dataset demonstrate that our approach, ACE-ICD establishes new state-of-the-art performance across multiple settings, including common codes, rare codes, and full-code assignments. Our code is publicly available.
Anthology ID:
2025.findings-ijcnlp.102
Volume:
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Month:
December
Year:
2025
Address:
Mumbai, India
Editors:
Kentaro Inui, Sakriani Sakti, Haofen Wang, Derek F. Wong, Pushpak Bhattacharyya, Biplab Banerjee, Asif Ekbal, Tanmoy Chakraborty, Dhirendra Pratap Singh
Venue:
Findings
SIG:
Publisher:
The Asian Federation of Natural Language Processing and The Association for Computational Linguistics
Note:
Pages:
1650–1662
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.findings-ijcnlp.102/
DOI:
Bibkey:
Cite (ACL):
Tuan-Dung Le, Shohreh Haddadan, and Thanh Q. Thieu. 2025. ACE-ICD: Acronym Expansion As Data Augmentation For Automated ICD Coding. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 1650–1662, Mumbai, India. The Asian Federation of Natural Language Processing and The Association for Computational Linguistics.
Cite (Informal):
ACE-ICD: Acronym Expansion As Data Augmentation For Automated ICD Coding (Le et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.findings-ijcnlp.102.pdf