Abstract
We investigate the quality of task specific word embeddings created with relatively small, targeted corpora. We present a comprehensive evaluation framework including both intrinsic and extrinsic evaluation that can be expanded to named entities beyond drug name. Intrinsic evaluation results tell that drug name embeddings created with a domain specific document corpus outperformed the previously published versions that derived from a very large general text corpus. Extrinsic evaluation uses word embedding for the task of drug name recognition with Bi-LSTM model and the results demonstrate the advantage of using domain-specific word embeddings as the only input feature for drug name recognition with F1-score achieving 0.91. This work suggests that it may be advantageous to derive domain specific embeddings for certain tasks even when the domain specific corpus is of limited size.- Anthology ID:
- W18-2319
- Volume:
- Proceedings of the BioNLP 2018 workshop
- Month:
- July
- Year:
- 2018
- Address:
- Melbourne, Australia
- Editors:
- Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
- Venue:
- BioNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 156–160
- Language:
- URL:
- https://aclanthology.org/W18-2319
- DOI:
- 10.18653/v1/W18-2319
- Cite (ACL):
- Mengnan Zhao, Aaron J. Masino, and Christopher C. Yang. 2018. A Framework for Developing and Evaluating Word Embeddings of Drug-named Entity. In Proceedings of the BioNLP 2018 workshop, pages 156–160, Melbourne, Australia. Association for Computational Linguistics.
- Cite (Informal):
- A Framework for Developing and Evaluating Word Embeddings of Drug-named Entity (Zhao et al., BioNLP 2018)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/W18-2319.pdf