Joao Bettencourt-silva


2022

pdf
Accelerating the Discovery of Semantic Associations from Medical Literature: Mining Relations Between Diseases and Symptoms
Alberto Purpura | Francesca Bonin | Joao Bettencourt-silva
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track

Medical literature is a vast and constantly expanding source of information about diseases, their diagnoses and treatments. One of the ways to extract insights from this type of data is through mining association rules between such entities. However, existing solutions do not take into account the semantics of sentences from which entity co-occurrences are extracted. We propose a scalable solution for the automated discovery of semantic associations between different entities such as diseases and their symptoms. Our approach employs the UMLS semantic network and a binary relation classification model trained with distant supervision to validate and help ranking the most likely entity associations pairs extracted with frequency-based association rule mining algorithms. We evaluate the proposed system on the task of extracting disease-symptom associations from a collection of over 14M PubMed abstracts and validate our results against a publicly available known list of disease-symptom pairs.