The impact of simple feature engineering in multilingual medical NER

Rebecka Weegar, Arantza Casillas, Arantza Diaz de Ilarraza, Maite Oronoz, Alicia Pérez, Koldo Gojenola


Abstract
The goal of this paper is to examine the impact of simple feature engineering mechanisms before applying more sophisticated techniques to the task of medical NER. Sometimes papers using scientifically sound techniques present raw baselines that could be improved adding simple and cheap features. This work focuses on entity recognition for the clinical domain for three languages: English, Swedish and Spanish. The task is tackled using simple features, starting from the window size, capitalization, prefixes, and moving to POS and semantic tags. This work demonstrates that a simple initial step of feature engineering can improve the baseline results significantly. Hence, the contributions of this paper are: first, a short list of guidelines well supported with experimental results on three languages and, second, a detailed description of the relevance of these features for medical NER.
Anthology ID:
W16-4201
Volume:
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Anna Rumshisky, Kirk Roberts, Steven Bethard, Tristan Naumann
Venue:
ClinicalNLP
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
1–6
Language:
URL:
https://aclanthology.org/W16-4201
DOI:
Bibkey:
Cite (ACL):
Rebecka Weegar, Arantza Casillas, Arantza Diaz de Ilarraza, Maite Oronoz, Alicia Pérez, and Koldo Gojenola. 2016. The impact of simple feature engineering in multilingual medical NER. In Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP), pages 1–6, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
The impact of simple feature engineering in multilingual medical NER (Weegar et al., ClinicalNLP 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-dup-bibkey/W16-4201.pdf