Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework

Wonjin Yoon, Richard Jackson, Elliot Ford, Vladimir Poroshin, Jaewoo Kang


Abstract
In order to assist the drug discovery/development process, pharmaceutical companies often apply biomedical NER and linking techniques over internal and public corpora. Decades of study of the field of BioNLP has produced a plethora of algorithms, systems and datasets. However, our experience has been that no single open source system meets all the requirements of a modern pharmaceutical company. In this work, we describe these requirements according to our experience of the industry, and present Kazu, a highly extensible, scalable open source framework designed to support BioNLP for the pharmaceutical sector. Kazu is a built around a computationally efficient version of the BERN2 NER model (TinyBERN2), and subsequently wraps several other BioNLP technologies into one coherent system.
Anthology ID:
2022.emnlp-industry.63
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
December
Year:
2022
Address:
Abu Dhabi, UAE
Editors:
Yunyao Li, Angeliki Lazaridou
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
619–626
Language:
URL:
https://aclanthology.org/2022.emnlp-industry.63
DOI:
10.18653/v1/2022.emnlp-industry.63
Bibkey:
Cite (ACL):
Wonjin Yoon, Richard Jackson, Elliot Ford, Vladimir Poroshin, and Jaewoo Kang. 2022. Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 619–626, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework (Yoon et al., EMNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2022.emnlp-industry.63.pdf