Yang Liu

3M Health Information Systems

Other people with similar names: Yang Liu (Samsung Research Center Beijing), Yang Liu (Edinburgh Ph.D., Microsoft), Yang Liu (University of Helsinki), Yang Liu (Univ. of Michigan, UC Santa Cruz), Yang Janet Liu (Georgetown University; 刘洋), Yang Liu (National University of Defense Technology), Yang Liu (Microsoft Cognitive Services Research), Yang Liu (刘扬) (May refer to several people), Yang Liu (刘洋) (刘洋; ICT, Tsinghua, Beijing Academy of Artificial Intelligence), Yang Liu (Wilfrid Laurier University), Yang Liu (The Chinese University of Hong Kong (Shenzhen)), Yang Liu (刘扬) (刘扬; Ph.D Purdue; ICSI, Dallas, Facebook, Liulishuo, Amazon), Yang Liu (Beijing Language and Culture University), Yang Liu (刘扬) (Peking University), Yang Liu (Tianjin University, China)


2021

pdf bib
Effective Convolutional Attention Network for Multi-label Clinical Document Classification
Yang Liu | Hua Cheng | Russell Klopfer | Matthew R. Gormley | Thomas Schaaf
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Multi-label document classification (MLDC) problems can be challenging, especially for long documents with a large label set and a long-tail distribution over labels. In this paper, we present an effective convolutional attention network for the MLDC problem with a focus on medical code prediction from clinical documents. Our innovations are three-fold: (1) we utilize a deep convolution-based encoder with the squeeze-and-excitation networks and residual networks to aggregate the information across the document and learn meaningful document representations that cover different ranges of texts; (2) we explore multi-layer and sum-pooling attention to extract the most informative features from these multi-scale representations; (3) we combine binary cross entropy loss and focal loss to improve performance for rare labels. We focus our evaluation study on MIMIC-III, a widely used dataset in the medical domain. Our models outperform prior work on medical coding and achieve new state-of-the-art results on multiple metrics. We also demonstrate the language independent nature of our approach by applying it to two non-English datasets. Our model outperforms prior best model and a multilingual Transformer model by a substantial margin.