Yang Liu
3M Health Information Systems
Other people with similar names:
Yang Liu
(Samsung Research Center Beijing),
Yang Liu
(Edinburgh Ph.D., Microsoft),
Yang Liu
(University of Helsinki),
Yang Liu
(Univ. of Michigan, UC Santa Cruz),
Yang Janet Liu
(Georgetown University; 刘洋),
Yang Liu
(National University of Defense Technology),
Yang Liu
(Microsoft Cognitive Services Research),
Yang Liu (刘扬)
(May refer to several people),
Yang Liu (刘洋)
(刘洋; ICT, Tsinghua, Beijing Academy of Artificial Intelligence),
Yang Liu
(Wilfrid Laurier University),
Yang Liu
(The Chinese University of Hong Kong (Shenzhen)),
Yang Liu (刘扬)
(刘扬; Ph.D Purdue; ICSI, Dallas, Facebook, Liulishuo, Amazon),
Yang Liu
(Beijing Language and Culture University),
Yang Liu (刘扬)
(Peking University),
Yang Liu
(Tianjin University, China)
2021
pdf
bib
abs
Effective Convolutional Attention Network for Multi-label Clinical Document Classification
Yang Liu
|
Hua Cheng
|
Russell Klopfer
|
Matthew R. Gormley
|
Thomas Schaaf
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Multi-label document classification (MLDC) problems can be challenging, especially for long documents with a large label set and a long-tail distribution over labels. In this paper, we present an effective convolutional attention network for the MLDC problem with a focus on medical code prediction from clinical documents. Our innovations are three-fold: (1) we utilize a deep convolution-based encoder with the squeeze-and-excitation networks and residual networks to aggregate the information across the document and learn meaningful document representations that cover different ranges of texts; (2) we explore multi-layer and sum-pooling attention to extract the most informative features from these multi-scale representations; (3) we combine binary cross entropy loss and focal loss to improve performance for rare labels. We focus our evaluation study on MIMIC-III, a widely used dataset in the medical domain. Our models outperform prior work on medical coding and achieve new state-of-the-art results on multiple metrics. We also demonstrate the language independent nature of our approach by applying it to two non-English datasets. Our model outperforms prior best model and a multilingual Transformer model by a substantial margin.