Abstract
Rapidly expanding volume of publications in the biomedical domain makes it increasingly difficult for a timely evaluation of the latest literature. That, along with a push for automated evaluation of clinical reports, present opportunities for effective natural language processing methods. In this study we target the problem of named entity recognition, where texts are processed to annotate terms that are relevant for biomedical studies. Terms of interest in the domain include gene and protein names, and cell lines and types. Here we report on a pipeline built on Embeddings from Language Models (ELMo) and a deep learning package for natural language processing (AllenNLP). We trained context-aware token embeddings on a dataset of biomedical papers using ELMo, and incorporated these embeddings in the LSTM-CRF model used by AllenNLP for named entity recognition. We show these representations improve named entity recognition for different types of biomedical named entities. We also achieve a new state of the art in gene mention detection on the BioCreative II gene mention shared task.- Anthology ID:
- W18-5618
- Volume:
- Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis
- Month:
- October
- Year:
- 2018
- Address:
- Brussels, Belgium
- Venue:
- Louhi
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 160–164
- Language:
- URL:
- https://aclanthology.org/W18-5618
- DOI:
- 10.18653/v1/W18-5618
- Cite (ACL):
- Golnar Sheikhshabbafghi, Inanc Birol, and Anoop Sarkar. 2018. In-domain Context-aware Token Embeddings Improve Biomedical Named Entity Recognition. In Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis, pages 160–164, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- In-domain Context-aware Token Embeddings Improve Biomedical Named Entity Recognition (Sheikhshabbafghi et al., Louhi 2018)
- PDF:
- https://preview.aclanthology.org/starsem-semeval-split/W18-5618.pdf
- Data
- Billion Word Benchmark