A Dataset for ICD-10 Coding of Death Certificates: Creation and Usage
Thomas Lavergne, Aurélie Névéol, Aude Robert, Cyril Grouin, Grégoire Rey, Pierre Zweigenbaum
Abstract
Very few datasets have been released for the evaluation of diagnosis coding with the International Classification of Diseases, and only one so far in a language other than English. This paper describes a large-scale dataset prepared from French death certificates, and the problems which needed to be solved to turn it into a dataset suitable for the application of machine learning and natural language processing methods of ICD-10 coding. The dataset includes the free-text statements written by medical doctors, the associated meta-data, the human coder-assigned codes for each statement, as well as the statement segments which supported the coder’s decision for each code. The dataset comprises 93,694 death certificates totalling 276,103 statements and 377,677 ICD-10 code assignments (3,457 unique codes). It was made available for an international automated coding shared task, which attracted five participating teams. An extended version of the dataset will be used in a new edition of the shared task.- Anthology ID:
- W16-5107
- Volume:
- Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)
- Month:
- December
- Year:
- 2016
- Address:
- Osaka, Japan
- Editors:
- Sophia Ananiadou, Riza Batista-Navarro, Kevin Bretonnel Cohen, Dina Demner-Fushman, Paul Thompson
- Venue:
- WS
- SIG:
- Publisher:
- The COLING 2016 Organizing Committee
- Note:
- Pages:
- 60–69
- Language:
- URL:
- https://aclanthology.org/W16-5107
- DOI:
- Cite (ACL):
- Thomas Lavergne, Aurélie Névéol, Aude Robert, Cyril Grouin, Grégoire Rey, and Pierre Zweigenbaum. 2016. A Dataset for ICD-10 Coding of Death Certificates: Creation and Usage. In Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016), pages 60–69, Osaka, Japan. The COLING 2016 Organizing Committee.
- Cite (Informal):
- A Dataset for ICD-10 Coding of Death Certificates: Creation and Usage (Lavergne et al., 2016)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/W16-5107.pdf
- Data
- ICDCN2019