Grigorina Mitrofan


2019

pdf
MoNERo: a Biomedical Gold Standard Corpus for the Romanian Language
Maria Mitrofan | Verginica Barbu Mititelu | Grigorina Mitrofan
Proceedings of the 18th BioNLP Workshop and Shared Task

In an era when large amounts of data are generated daily in various fields, the biomedical field among others, linguistic resources can be exploited for various tasks of Natural Language Processing. Moreover, increasing number of biomedical documents are available in languages other than English. To be able to extract information from natural language free text resources, methods and tools are needed for a variety of languages. This paper presents the creation of the MoNERo corpus, a gold standard biomedical corpus for Romanian, annotated with both part of speech tags and named entities. MoNERo comprises 154,825 morphologically annotated tokens and 23,188 entity annotations belonging to four entity semantic groups corresponding to UMLS Semantic Groups.