Cancer Hallmark Text Classification Using Convolutional Neural Networks

Simon Baker, Anna Korhonen, Sampo Pyysalo


Abstract
Methods based on deep learning approaches have recently achieved state-of-the-art performance in a range of machine learning tasks and are increasingly applied to natural language processing (NLP). Despite strong results in various established NLP tasks involving general domain texts, there is only limited work applying these models to biomedical NLP. In this paper, we consider a Convolutional Neural Network (CNN) approach to biomedical text classification. Evaluation using a recently introduced cancer domain dataset involving the categorization of documents according to the well-established hallmarks of cancer shows that a basic CNN model can achieve a level of performance competitive with a Support Vector Machine (SVM) trained using complex manually engineered features optimized to the task. We further show that simple modifications to the CNN hyperparameters, initialization, and training process allow the model to notably outperform the SVM, establishing a new state of the art result at this task. We make all of the resources and tools introduced in this study available under open licenses from https://cambridgeltl.github.io/cancer-hallmark-cnn/.
Anthology ID:
W16-5101
Volume:
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)
Month:
December
Year:
2016
Address:
Osaka, Japan
Venue:
WS
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
1–9
Language:
URL:
https://aclanthology.org/W16-5101
DOI:
Bibkey:
Cite (ACL):
Simon Baker, Anna Korhonen, and Sampo Pyysalo. 2016. Cancer Hallmark Text Classification Using Convolutional Neural Networks. In Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016), pages 1–9, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Cancer Hallmark Text Classification Using Convolutional Neural Networks (Baker et al., 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/W16-5101.pdf
Data
HOC