Abstract
Clinical coding is currently a labour-intensive, error-prone, but a critical administrative process whereby hospital patient episodes are manually assigned codes by qualified staff from large, standardised taxonomic hierarchies of codes. Automating clinical coding has a long history in NLP research and has recently seen novel developments setting new benchmark results. A popular dataset used in this task is MIMIC-III, a large database of clinical free text notes and their associated codes amongst other data. We argue for the reconsideration of the validity MIMIC-III’s assigned codes, as MIMIC-III has not undergone secondary validation. This work presents an open-source, reproducible experimental methodology for assessing the validity of EHR discharge summaries. We exemplify the methodology with MIMIC-III discharge summaries and show the most frequently assigned codes in MIMIC-III are undercoded up to 35%.- Anthology ID:
- 2020.bionlp-1.8
- Volume:
- Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing
- Month:
- July
- Year:
- 2020
- Address:
- Online
- Editors:
- Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
- Venue:
- BioNLP
- SIG:
- SIGBIOMED
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 76–85
- Language:
- URL:
- https://aclanthology.org/2020.bionlp-1.8
- DOI:
- 10.18653/v1/2020.bionlp-1.8
- Cite (ACL):
- Thomas Searle, Zina Ibrahim, and Richard Dobson. 2020. Experimental Evaluation and Development of a Silver-Standard for the MIMIC-III Clinical Coding Dataset. In Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing, pages 76–85, Online. Association for Computational Linguistics.
- Cite (Informal):
- Experimental Evaluation and Development of a Silver-Standard for the MIMIC-III Clinical Coding Dataset (Searle et al., BioNLP 2020)
- PDF:
- https://preview.aclanthology.org/corrections-2024-05/2020.bionlp-1.8.pdf
- Data
- MIMIC-III