Abstract
Having in mind that different languages might present different challenges, this paper presents the following contributions to the area of Information Extraction from clinical text, targeting the Portuguese language: a collection of 281 clinical texts in this language, with manually-annotated named entities; word embeddings trained in a larger collection of similar texts; results of using BiLSTM-CRF neural networks for named entity recognition on the annotated collection, including a comparison of using in-domain or out-of-domain word embeddings in this task. Although learned with much less data, performance is higher when using in-domain embeddings. When tested in 20 independent clinical texts, this model achieved better results than a model using larger out-of-domain embeddings.- Anthology ID:
- W19-5024
- Volume:
- Proceedings of the 18th BioNLP Workshop and Shared Task
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
- Venue:
- BioNLP
- SIG:
- SIGBIOMED
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 223–233
- Language:
- URL:
- https://aclanthology.org/W19-5024
- DOI:
- 10.18653/v1/W19-5024
- Cite (ACL):
- Fábio Lopes, César Teixeira, and Hugo Gonçalo Oliveira. 2019. Contributions to Clinical Named Entity Recognition in Portuguese. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 223–233, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Contributions to Clinical Named Entity Recognition in Portuguese (Lopes et al., BioNLP 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/W19-5024.pdf
- Code
- fabioacl/PortugueseClinicalNER