Matúš Falis

Also published as: Matus Falis


2022

pdf
Horses to Zebras: Ontology-Guided Data Augmentation and Synthesis for ICD-9 Coding
Matúš Falis | Hang Dong | Alexandra Birch | Beatrice Alex
Proceedings of the 21st Workshop on Biomedical Language Processing

Medical document coding is the process of assigning labels from a structured label space (ontology – e.g., ICD-9) to medical documents. This process is laborious, costly, and error-prone. In recent years, efforts have been made to automate this process with neural models. The label spaces are large (in the order of thousands of labels) and follow a big-head long-tail label distribution, giving rise to few-shot and zero-shot scenarios. Previous efforts tried to address these scenarios within the model, leading to improvements on rare labels, but worse results on frequent ones. We propose data augmentation and synthesis techniques in order to address these scenarios. We further introduce an analysis technique for this setting inspired by confusion matrices. This analysis technique points to the positive impact of data augmentation and synthesis, but also highlights more general issues of confusion within families of codes, and underprediction.

2021

pdf
CoPHE: A Count-Preserving Hierarchical Evaluation Metric in Large-Scale Multi-Label Text Classification
Matúš Falis | Hang Dong | Alexandra Birch | Beatrice Alex
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Large-Scale Multi-Label Text Classification (LMTC) includes tasks with hierarchical label spaces, such as automatic assignment of ICD-9 codes to discharge summaries. Performance of models in prior art is evaluated with standard precision, recall, and F1 measures without regard for the rich hierarchical structure. In this work we argue for hierarchical evaluation of the predictions of neural LMTC models. With the example of the ICD-9 ontology we describe a structural issue in the representation of the structured label space in prior art, and propose an alternative representation based on the depth of the ontology. We propose a set of metrics for hierarchical evaluation using the depth-based representation. We compare the evaluation scores from the proposed metrics with previously used metrics on prior art LMTC models for ICD-9 coding in MIMIC-III. We also propose further avenues of research involving the proposed ontological representation.

2019

pdf
Ontological attention ensembles for capturing semantic concepts in ICD code prediction from clinical text
Matus Falis | Maciej Pajak | Aneta Lisowska | Patrick Schrempf | Lucas Deckers | Shadia Mikhael | Sotirios Tsaftaris | Alison O’Neil
Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019)

We present a semantically interpretable system for automated ICD coding of clinical text documents. Our contribution is an ontological attention mechanism which matches the structure of the ICD ontology, in which shared attention vectors are learned at each level of the hierarchy, and combined into label-dependent ensembles. Analysis of the attention heads shows that shared concepts are learned by the lowest common denominator node. This allows child nodes to focus on the differentiating concepts, leading to efficient learning and memory usage. Visualisation of the multi-level attention on the original text allows explanation of the code predictions according to the semantics of the ICD ontology. On the MIMIC-III dataset we achieve a 2.7% absolute (11% relative) improvement from 0.218 to 0.245 macro-F1 score compared to the previous state of the art across 3,912 codes. Finally, we analyse the labelling inconsistencies arising from different coding practices which limit performance on this task.