A Comparison of Word-based and Context-based Representations for Classification Problems in Health Informatics
Aditya Joshi, Sarvnaz Karimi, Ross Sparks, Cecile Paris, C Raina MacIntyre
Abstract
Distributed representations of text can be used as features when training a statistical classifier. These representations may be created as a composition of word vectors or as context-based sentence vectors. We compare the two kinds of representations (word versus context) for three classification problems: influenza infection classification, drug usage classification and personal health mention classification. For statistical classifiers trained for each of these problems, context-based representations based on ELMo, Universal Sentence Encoder, Neural-Net Language Model and FLAIR are better than Word2Vec, GloVe and the two adapted using the MESH ontology. There is an improvement of 2-4% in the accuracy when these context-based representations are used instead of word-based representations.- Anthology ID:
- W19-5015
- Volume:
- Proceedings of the 18th BioNLP Workshop and Shared Task
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
- Venue:
- BioNLP
- SIG:
- SIGBIOMED
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 135–141
- Language:
- URL:
- https://aclanthology.org/W19-5015
- DOI:
- 10.18653/v1/W19-5015
- Cite (ACL):
- Aditya Joshi, Sarvnaz Karimi, Ross Sparks, Cecile Paris, and C Raina MacIntyre. 2019. A Comparison of Word-based and Context-based Representations for Classification Problems in Health Informatics. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 135–141, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- A Comparison of Word-based and Context-based Representations for Classification Problems in Health Informatics (Joshi et al., BioNLP 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/W19-5015.pdf