Text2Tree: Aligning Text Representation to the Label Tree Hierarchy for Imbalanced Medical Classification

Jiahuan Yan, Haojun Gao, Zhang Kai, Weize Liu, Danny Chen, Jian Wu, Jintai Chen


Abstract
Deep learning approaches exhibit promising performances on various text tasks. However, they are still struggling on medical text classification since samples are often extremely imbalanced and scarce. Different from existing mainstream approaches that focus on supplementary semantics with external medical information, this paper aims to rethink the data challenges in medical texts and present a novel framework-agnostic algorithm called Text2Tree that only utilizes internal label hierarchy in training deep learning models. We embed the ICD code tree structure of labels into cascade attention modules for learning hierarchy-aware label representations. Two new learning schemes, Similarity Surrogate Learning (SSL) and Dissimilarity Mixup Learning (DML), are devised to boost text classification by reusing and distinguishing samples of other labels following the label representation hierarchy, respectively. Experiments on authoritative public datasets and real-world medical records show that our approach stably achieves superior performances over classical and advanced imbalanced classification methods. Our code is available at https://github.com/jyansir/Text2Tree.
Anthology ID:
2023.findings-emnlp.517
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7705–7720
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.517
DOI:
10.18653/v1/2023.findings-emnlp.517
Bibkey:
Cite (ACL):
Jiahuan Yan, Haojun Gao, Zhang Kai, Weize Liu, Danny Chen, Jian Wu, and Jintai Chen. 2023. Text2Tree: Aligning Text Representation to the Label Tree Hierarchy for Imbalanced Medical Classification. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7705–7720, Singapore. Association for Computational Linguistics.
Cite (Informal):
Text2Tree: Aligning Text Representation to the Label Tree Hierarchy for Imbalanced Medical Classification (Yan et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2023.findings-emnlp.517.pdf