Distinguishing between focus and background entities in biomedical corpora using discourse structure and transformers

Antonio Jimeno Yepes, Karin Verspoor


Abstract
Scientific documents typically contain numerous entity mentions, while only a subset are directly relevant to the key contributions of the paper. Distinguishing these focus entities from background ones effectively could improve the recovery of relevant documents and the extraction of information from documents. To study the identification of focus entities, we developed two large datasets of disease-causing biological pathogens using MEDLINE, the largest collection of biomedical citations, and PubMed Central, a collection of full text articles. The focus entities were identified using human-curated indexing on these collections. Experiments with machine learning methods to identify focus entities show that transformer methods achieve high precision and recall and that document discourse information is relevant. The work lays the foundation for more targeted retrieval/summarisation of entity-relevant documents.
Anthology ID:
2022.louhi-1.4
Volume:
Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Alberto Lavelli, Eben Holderness, Antonio Jimeno Yepes, Anne-Lyse Minard, James Pustejovsky, Fabio Rinaldi
Venue:
Louhi
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
35–40
Language:
URL:
https://aclanthology.org/2022.louhi-1.4
DOI:
10.18653/v1/2022.louhi-1.4
Bibkey:
Cite (ACL):
Antonio Jimeno Yepes and Karin Verspoor. 2022. Distinguishing between focus and background entities in biomedical corpora using discourse structure and transformers. In Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI), pages 35–40, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Distinguishing between focus and background entities in biomedical corpora using discourse structure and transformers (Jimeno Yepes & Verspoor, Louhi 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2022.louhi-1.4.pdf