Masato Mizogaki

2026

Transformer-based models such as PLM-CA achieve strong performance for automatic ICD coding, but their attention weights do not provide faithful explanations of their predictions. This is a major limitation for electronic medical records, where users often need concise and trustworthy evidence for each assigned code. To address this issue, we jointly train a sentence extractor and an ICD code classifier such that predictions are based only on the extracted sentences. As a result, the extracted sentences serve as faithful rationales for each predicted code and substantially reduce the effort required to inspect long medical records. Experiments on MIMIC-III show that our method approaches the performance of a transformer baseline that processes the full record while using only a small fraction of the document.

Co-authors

Yichen Wang 1

Venues

BioNLP1
WS1

Fix author