Serbian NER&Beyond: The Archaic and the Modern Intertwinned
Branislava Šandrih Todorović, Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić
Abstract
In this work, we present a Serbian literary corpus that is being developed under the umbrella of the “Distant Reading for European Literary History” COST Action CA16204. Using this corpus of novels written more than a century ago, we have developed and made publicly available a Named Entity Recognizer (NER) trained to recognize 7 different named entity types, with a Convolutional Neural Network (CNN) architecture, having F1 score of ≈91% on the test dataset. This model has been further assessed on a separate evaluation dataset. We wrap up with comparison of the developed model with the existing one, followed by a discussion of pros and cons of the both models.- Anthology ID:
- 2021.ranlp-1.141
- Volume:
- Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
- Month:
- September
- Year:
- 2021
- Address:
- Held Online
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 1252–1260
- Language:
- URL:
- https://aclanthology.org/2021.ranlp-1.141
- DOI:
- Cite (ACL):
- Branislava Šandrih Todorović, Cvetana Krstev, Ranka Stanković, and Milica Ikonić Nešić. 2021. Serbian NER&Beyond: The Archaic and the Modern Intertwinned. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 1252–1260, Held Online. INCOMA Ltd..
- Cite (Informal):
- Serbian NER&Beyond: The Archaic and the Modern Intertwinned (Šandrih Todorović et al., RANLP 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.ranlp-1.141.pdf