Branislava Šandrih Todorović


2021

pdf bib
Serbian NER&Beyond: The Archaic and the Modern Intertwinned
Branislava Šandrih Todorović | Cvetana Krstev | Ranka Stanković | Milica Ikonić Nešić
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

In this work, we present a Serbian literary corpus that is being developed under the umbrella of the “Distant Reading for European Literary History” COST Action CA16204. Using this corpus of novels written more than a century ago, we have developed and made publicly available a Named Entity Recognizer (NER) trained to recognize 7 different named entity types, with a Convolutional Neural Network (CNN) architecture, having F1 score of ≈91% on the test dataset. This model has been further assessed on a separate evaluation dataset. We wrap up with comparison of the developed model with the existing one, followed by a discussion of pros and cons of the both models.