Biomedical Entity Representations with Synonym Marginalization

Mujeen Sung, Hwisang Jeon, Jinhyuk Lee, Jaewoo Kang


Abstract
Biomedical named entities often play important roles in many biomedical text mining tools. However, due to the incompleteness of provided synonyms and numerous variations in their surface forms, normalization of biomedical entities is very challenging. In this paper, we focus on learning representations of biomedical entities solely based on the synonyms of entities. To learn from the incomplete synonyms, we use a model-based candidate selection and maximize the marginal likelihood of the synonyms present in top candidates. Our model-based candidates are iteratively updated to contain more difficult negative samples as our model evolves. In this way, we avoid the explicit pre-selection of negative samples from more than 400K candidates. On four biomedical entity normalization datasets having three different entity types (disease, chemical, adverse reaction), our model BioSyn consistently outperforms previous state-of-the-art models almost reaching the upper bound on each dataset.
Anthology ID:
2020.acl-main.335
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3641–3650
Language:
URL:
https://aclanthology.org/2020.acl-main.335
DOI:
10.18653/v1/2020.acl-main.335
Bibkey:
Cite (ACL):
Mujeen Sung, Hwisang Jeon, Jinhyuk Lee, and Jaewoo Kang. 2020. Biomedical Entity Representations with Synonym Marginalization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3641–3650, Online. Association for Computational Linguistics.
Cite (Informal):
Biomedical Entity Representations with Synonym Marginalization (Sung et al., ACL 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2020.acl-main.335.pdf
Video:
 http://slideslive.com/38929043
Code
 dmis-lab/BioSyn +  additional community code
Data
BC5CDRNCBI Disease