Method Entity Extraction from Biomedical Texts

Waqar Bin Kalim, Robert E. Mercer


Abstract
In the field of Natural Language Processing (NLP), extracting method entities from biomedical text has been a challenging task. Scientific research papers commonly consist of complex keywords and domain-specific terminologies, and new terminologies are continuously appearing. In this research, we find method terminologies in biomedical text using both rule-based and machine learning techniques. We first use linguistic features to extract method sentence candidates from a large corpus of biomedical text. Then, we construct a silver standard biomedical corpus composed of these sentences. With a rule-based method that makes use of the Stanza dependency parsing module, we label the method entities in these sentences. Using this silver standard corpus we train two machine learning algorithms to automatically extract method entities from biomedical text. Our results show that it is possible to develop machine learning models that can automatically extract method entities to a reasonable accuracy without the need for a gold standard dataset.
Anthology ID:
2022.coling-1.207
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
2357–2362
Language:
URL:
https://aclanthology.org/2022.coling-1.207
DOI:
Bibkey:
Cite (ACL):
Waqar Bin Kalim and Robert E. Mercer. 2022. Method Entity Extraction from Biomedical Texts. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2357–2362, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Method Entity Extraction from Biomedical Texts (Kalim & Mercer, COLING 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.coling-1.207.pdf