2017
pdf
abs
Efficient Encoding of Pathology Reports Using Natural Language Processing
Rebecka Weegar
|
Jan F Nygård
|
Hercules Dalianis
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
In this article we present a system that extracts information from pathology reports. The reports are written in Norwegian and contain free text describing prostate biopsies. Currently, these reports are manually coded for research and statistical purposes by trained experts at the Cancer Registry of Norway where the coders extract values for a set of predefined fields that are specific for prostate cancer. The presented system is rule based and achieves an average F-score of 0.91 for the fields Gleason grade, Gleason score, the number of biopsies that contain tumor tissue, and the orientation of the biopsies. The system also identifies reports that contain ambiguity or other content that should be reviewed by an expert. The system shows potential to encode the reports considerably faster, with less resources, and similar high quality to the manual encoding.
2016
pdf
bib
abs
The impact of simple feature engineering in multilingual medical NER
Rebecka Weegar
|
Arantza Casillas
|
Arantza Diaz de Ilarraza
|
Maite Oronoz
|
Alicia Pérez
|
Koldo Gojenola
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)
The goal of this paper is to examine the impact of simple feature engineering mechanisms before applying more sophisticated techniques to the task of medical NER. Sometimes papers using scientifically sound techniques present raw baselines that could be improved adding simple and cheap features. This work focuses on entity recognition for the clinical domain for three languages: English, Swedish and Spanish. The task is tackled using simple features, starting from the window size, capitalization, prefixes, and moving to POS and semantic tags. This work demonstrates that a simple initial step of feature engineering can improve the baseline results significantly. Hence, the contributions of this paper are: first, a short list of guidelines well supported with experimental results on three languages and, second, a detailed description of the relevance of these features for medical NER.
2015
pdf
Linking Entities Across Images and Text
Rebecka Weegar
|
Kalle Åström
|
Pierre Nugues
Proceedings of the Nineteenth Conference on Computational Natural Language Learning
pdf
Creating a rule based system for text mining of Norwegian breast cancer pathology reports
Rebecka Weegar
|
Hercules Dalianis
Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis