MoNERo: a Biomedical Gold Standard Corpus for the Romanian Language
Maria Mitrofan, Verginica Barbu Mititelu, Grigorina Mitrofan
Abstract
In an era when large amounts of data are generated daily in various fields, the biomedical field among others, linguistic resources can be exploited for various tasks of Natural Language Processing. Moreover, increasing number of biomedical documents are available in languages other than English. To be able to extract information from natural language free text resources, methods and tools are needed for a variety of languages. This paper presents the creation of the MoNERo corpus, a gold standard biomedical corpus for Romanian, annotated with both part of speech tags and named entities. MoNERo comprises 154,825 morphologically annotated tokens and 23,188 entity annotations belonging to four entity semantic groups corresponding to UMLS Semantic Groups.- Anthology ID:
- W19-5008
- Volume:
- Proceedings of the 18th BioNLP Workshop and Shared Task
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
- Venue:
- BioNLP
- SIG:
- SIGBIOMED
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 71–79
- Language:
- URL:
- https://aclanthology.org/W19-5008
- DOI:
- 10.18653/v1/W19-5008
- Cite (ACL):
- Maria Mitrofan, Verginica Barbu Mititelu, and Grigorina Mitrofan. 2019. MoNERo: a Biomedical Gold Standard Corpus for the Romanian Language. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 71–79, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- MoNERo: a Biomedical Gold Standard Corpus for the Romanian Language (Mitrofan et al., BioNLP 2019)
- PDF:
- https://preview.aclanthology.org/ingest-bitext-workshop/W19-5008.pdf