Risk Factors Extraction from Clinical Texts based on Linked Open Data

Svetla Boytcheva, Galia Angelova, Zhivko Angelov


Abstract
This paper presents experiments in risk factors analysis based on clinical texts enhanced with Linked Open Data (LOD). The idea is to determine whether a patient has risk factors for a specific disease analyzing only his/her outpatient records. A semantic graph of “meta-knowledge” about a disease of interest is constructed, with integrated multilingual terms (labels) of symptoms, risk factors etc. coming from Wikidata, PubMed, Wikipedia and MESH, and linked to clinical records of individual patients via ICD–10 codes. Then a predictive model is trained to foretell whether patients are at risk to develop the disease of interest. The testing was done using outpatient records from a nation-wide repository available for the period 2011-2016. The results show improvement of the overall performance of all tested algorithms (kNN, Naive Bayes, Tree, Logistic regression, ANN), when the clinical texts are enriched with LOD resources.
Anthology ID:
R19-1019
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
161–167
Language:
URL:
https://aclanthology.org/R19-1019
DOI:
10.26615/978-954-452-056-4_019
Bibkey:
Cite (ACL):
Svetla Boytcheva, Galia Angelova, and Zhivko Angelov. 2019. Risk Factors Extraction from Clinical Texts based on Linked Open Data. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 161–167, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Risk Factors Extraction from Clinical Texts based on Linked Open Data (Boytcheva et al., RANLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/R19-1019.pdf