Ranka Stanković
Also published as: Ranka Stankovic
Papers on this page may belong to the following people: Ranka Stanković, Ranka Stankovic
2025
Light Verb Constructions in ELEXIS-WSD – Annotation, Comparisons and Issues
Cvetana Krstev | Ranka Stanković | Aleksandra Marković
Journal Computational Linguistics in Bulgaria
Cvetana Krstev | Ranka Stanković | Aleksandra Marković
Journal Computational Linguistics in Bulgaria
This paper deals with light verb constructions and their annotation in ELEXIS-sr, the Serbian extension of the ELEXIS-WSD corpus. In Section 1, general introductory remarks are given about these constructions, the notion of light verbs, and their treatment and further classification in the PARSEME annotation guidelines (subtypes LVC.full and LVC.cause). Section 2 offers an insight into ELEXIS-WSD corpus, annotated with VMWEs for several languages, with a remark that these VMWEs were not further subcategorised into finer classes. For this paper, we classified them ourselves to facilitate comparisons of the LVCs annotated in ELEXIS-sr. Tools and resources used for the automatic annotation of ELEXIS-sr are presented in Section 3, as well as the results of manual checking. In Section 4, we offer a comparison of LVCs in four ELEXIS-WSD sub-collections: Serbian, Bulgarian, Slovene, and English. We use Serbian as a starting point for this comparison, as it has been thoroughly annotated with MWEs (and NEs). We present the results of the comparison of all the occurrences of LVCs in the Serbian extension with their occurrences and annotation both in ELEXIS-WSD and Parseme sub-corpora for other languages. An important conclusion is that the most equivalents among LVCs are between Serbian and Bulgarian, closely related Slavic languages (a total of 34 equivalents), while between Serbian and Slovene, also Slavic, there are 11 equivalents, as between Serbian and English. It seems that this could be explained by the number of VMWES and LVCs annotated, or by the strategy used by different annotators.
2020
Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian
Ranka Stankovic | Branislava Šandrih | Cvetana Krstev | Miloš Utvić | Mihailo Skoric
Proceedings of the Twelfth Language Resources and Evaluation Conference
Ranka Stankovic | Branislava Šandrih | Cvetana Krstev | Miloš Utvić | Mihailo Skoric
Proceedings of the Twelfth Language Resources and Evaluation Conference
The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment between Serbian morphological dictionaries, MULTEXT-East and Universal Part-of-Speech tagset. The trained models will be used to publish the new version of the Corpus of Contemporary Serbian as well as the Serbian literary corpus. The performance of developed taggers were compared and the impact of training set size was investigated, which resulted in around 98% PoS-tagging precision per token for both new models. The sr_basic annotated dataset will also be published.
2019
Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names
Branislava Šandrih | Cvetana Krstev | Ranka Stankovic
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Branislava Šandrih | Cvetana Krstev | Ranka Stankovic
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian newspaper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annotation, which were further used to train two Named Entity Recognition (NER) systems: Stanford and spaCy. All obtained models, together with a rule- and lexicon-based system were evaluated on two sample texts: a part of the gold standard and an independent newspaper text of approximately the same size. The results show that rule- and lexicon-based system outperforms trained models in all four scenarios (measured by F1), while Stanford models has the highest precision. All systems obtain best results in recognizing full names, while the recognition of first names only is rather poor. The produced models are incorporated into a Web platform NER&Beyond that provides various NE-related functions.