BioReddit: Word Embeddings for User-Generated Biomedical NLP

Marco Basaldella, Nigel Collier


Abstract
Word embeddings, in their different shapes and iterations, have changed the natural language processing research landscape in the last years. The biomedical text processing field is no stranger to this revolution; however, scholars in the field largely trained their embeddings on scientific documents only, even when working on user-generated data. In this paper we show how training embeddings from a corpus collected from user-generated text from medical forums heavily influences the performance on downstream tasks, outperforming embeddings trained both on general purpose data or on scientific papers when applied on user-generated content.
Anthology ID:
D19-6205
Volume:
Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019)
Month:
November
Year:
2019
Address:
Hong Kong
Editors:
Eben Holderness, Antonio Jimeno Yepes, Alberto Lavelli, Anne-Lyse Minard, James Pustejovsky, Fabio Rinaldi
Venue:
Louhi
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
34–38
Language:
URL:
https://aclanthology.org/D19-6205
DOI:
10.18653/v1/D19-6205
Bibkey:
Cite (ACL):
Marco Basaldella and Nigel Collier. 2019. BioReddit: Word Embeddings for User-Generated Biomedical NLP. In Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019), pages 34–38, Hong Kong. Association for Computational Linguistics.
Cite (Informal):
BioReddit: Word Embeddings for User-Generated Biomedical NLP (Basaldella & Collier, Louhi 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-bitext-workshop/D19-6205.pdf