Cross-Lingual UMLS Named Entity Linking using UMLS Dictionary Fine-Tuning

Rina Galperin, Shachar Schnapp, Michael Elhadad


Abstract
We study cross-lingual UMLS named entity linking, where mentions in a given source language are mapped to UMLS concepts, most of which are labeled in English. Our cross-lingual framework includes an offline unsupervised construction of a translated UMLS dictionary and a per-document pipeline which identifies UMLS candidate mentions and uses a fine-tuned pretrained transformer language model to filter candidates according to context. Our method exploits a small dataset of manually annotated UMLS mentions in the source language and uses this supervised data in two ways: to extend the unsupervised UMLS dictionary and to fine-tune the contextual filtering of candidate mentions in full documents. We demonstrate results of our approach on both Hebrew and English. We achieve new state-of-the-art (SOTA) results on the Hebrew Camoni corpus, +8.9 F1 on average across three communities in the dataset. We also achieve new SOTA on the English dataset MedMentions with +7.3 F1.
Anthology ID:
2022.findings-acl.266
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3380–3390
Language:
URL:
https://aclanthology.org/2022.findings-acl.266
DOI:
10.18653/v1/2022.findings-acl.266
Bibkey:
Cite (ACL):
Rina Galperin, Shachar Schnapp, and Michael Elhadad. 2022. Cross-Lingual UMLS Named Entity Linking using UMLS Dictionary Fine-Tuning. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3380–3390, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Cross-Lingual UMLS Named Entity Linking using UMLS Dictionary Fine-Tuning (Galperin et al., Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2022.findings-acl.266.pdf
Software:
 2022.findings-acl.266.software.zip
Code
 rinagalperin/biomedical_nel
Data
BC5CDRMedMentions