The impact of simple feature engineering in multilingual medical NER
Rebecka Weegar, Arantza Casillas, Arantza Diaz de Ilarraza, Maite Oronoz, Alicia Pérez, Koldo Gojenola
Abstract
The goal of this paper is to examine the impact of simple feature engineering mechanisms before applying more sophisticated techniques to the task of medical NER. Sometimes papers using scientifically sound techniques present raw baselines that could be improved adding simple and cheap features. This work focuses on entity recognition for the clinical domain for three languages: English, Swedish and Spanish. The task is tackled using simple features, starting from the window size, capitalization, prefixes, and moving to POS and semantic tags. This work demonstrates that a simple initial step of feature engineering can improve the baseline results significantly. Hence, the contributions of this paper are: first, a short list of guidelines well supported with experimental results on three languages and, second, a detailed description of the relevance of these features for medical NER.- Anthology ID:
- W16-4201
- Volume:
- Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)
- Month:
- December
- Year:
- 2016
- Address:
- Osaka, Japan
- Venue:
- ClinicalNLP
- SIG:
- Publisher:
- The COLING 2016 Organizing Committee
- Note:
- Pages:
- 1–6
- Language:
- URL:
- https://aclanthology.org/W16-4201
- DOI:
- Cite (ACL):
- Rebecka Weegar, Arantza Casillas, Arantza Diaz de Ilarraza, Maite Oronoz, Alicia Pérez, and Koldo Gojenola. 2016. The impact of simple feature engineering in multilingual medical NER. In Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP), pages 1–6, Osaka, Japan. The COLING 2016 Organizing Committee.
- Cite (Informal):
- The impact of simple feature engineering in multilingual medical NER (Weegar et al., ClinicalNLP 2016)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/W16-4201.pdf