A Comparison of Word-based and Context-based Representations for Classification Problems in Health Informatics

Aditya Joshi, Sarvnaz Karimi, Ross Sparks, Cecile Paris, C Raina MacIntyre


Abstract
Distributed representations of text can be used as features when training a statistical classifier. These representations may be created as a composition of word vectors or as context-based sentence vectors. We compare the two kinds of representations (word versus context) for three classification problems: influenza infection classification, drug usage classification and personal health mention classification. For statistical classifiers trained for each of these problems, context-based representations based on ELMo, Universal Sentence Encoder, Neural-Net Language Model and FLAIR are better than Word2Vec, GloVe and the two adapted using the MESH ontology. There is an improvement of 2-4% in the accuracy when these context-based representations are used instead of word-based representations.
Anthology ID:
W19-5015
Volume:
Proceedings of the 18th BioNLP Workshop and Shared Task
Month:
August
Year:
2019
Address:
Florence, Italy
Venue:
BioNLP
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
135–141
Language:
URL:
https://aclanthology.org/W19-5015
DOI:
10.18653/v1/W19-5015
Bibkey:
Cite (ACL):
Aditya Joshi, Sarvnaz Karimi, Ross Sparks, Cecile Paris, and C Raina MacIntyre. 2019. A Comparison of Word-based and Context-based Representations for Classification Problems in Health Informatics. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 135–141, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
A Comparison of Word-based and Context-based Representations for Classification Problems in Health Informatics (Joshi et al., BioNLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/W19-5015.pdf