Noise Pollution in Hospital Readmission Prediction: Long Document Classification with Reinforcement Learning

Liyan Xu, Julien Hogan, Rachel E. Patzer, Jinho D. Choi


Abstract
This paper presents a reinforcement learning approach to extract noise in long clinical documents for the task of readmission prediction after kidney transplant. We face the challenges of developing robust models on a small dataset where each document may consist of over 10K tokens with full of noise including tabular text and task-irrelevant sentences. We first experiment four types of encoders to empirically decide the best document representation, and then apply reinforcement learning to remove noisy text from the long documents, which models the noise extraction process as a sequential decision problem. Our results show that the old bag-of-words encoder outperforms deep learning-based encoders on this task, and reinforcement learning is able to improve upon baseline while pruning out 25% text segments. Our analysis depicts that reinforcement learning is able to identify both typical noisy tokens and task-specific noisy text.
Anthology ID:
2020.bionlp-1.10
Volume:
Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing
Month:
July
Year:
2020
Address:
Online
Venue:
BioNLP
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
95–104
Language:
URL:
https://aclanthology.org/2020.bionlp-1.10
DOI:
10.18653/v1/2020.bionlp-1.10
Bibkey:
Cite (ACL):
Liyan Xu, Julien Hogan, Rachel E. Patzer, and Jinho D. Choi. 2020. Noise Pollution in Hospital Readmission Prediction: Long Document Classification with Reinforcement Learning. In Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing, pages 95–104, Online. Association for Computational Linguistics.
Cite (Informal):
Noise Pollution in Hospital Readmission Prediction: Long Document Classification with Reinforcement Learning (Xu et al., BioNLP 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2020.bionlp-1.10.pdf