Ranka Stankovic
Other people with similar names: Ranka Stanković
Unverified author pages with similar names: Ranka Stanković
2026
PARSEME 2.0 Multilingual Corpus of Multiword Expressions
Agata Savary | Manon Scholivet | Carlos Ramisch | Takuya Nakamura | Eric Bilinski | Sara Stymne | Voula Giouli | Stella Markantonatou | Vasile Pais | Maria Mitrofan | Louis Estève | Bruno Guillaume | Verginica Barbu Mititelu | Jaka Čibej | Roberto Díaz Hernández | Victoria Fendel | Polona Gantar | Olha Kanishcheva | Cvetana Krstev | Chaya Liebeskind | Irina Lobzhanidze | Aleksandra M. Marković | Gunta Nešpore-Bērzkalne | Adriana S. Pagano | Mehrnoush Shamsfard | Ranka Stankovic | Vahide Tajalli | Carole Tiberius | Aakanksha Padhye
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Agata Savary | Manon Scholivet | Carlos Ramisch | Takuya Nakamura | Eric Bilinski | Sara Stymne | Voula Giouli | Stella Markantonatou | Vasile Pais | Maria Mitrofan | Louis Estève | Bruno Guillaume | Verginica Barbu Mititelu | Jaka Čibej | Roberto Díaz Hernández | Victoria Fendel | Polona Gantar | Olha Kanishcheva | Cvetana Krstev | Chaya Liebeskind | Irina Lobzhanidze | Aleksandra M. Marković | Gunta Nešpore-Bērzkalne | Adriana S. Pagano | Mehrnoush Shamsfard | Ranka Stankovic | Vahide Tajalli | Carole Tiberius | Aakanksha Padhye
Proceedings of the Fifteenth Language Resources and Evaluation Conference
We present edition 2.0 of the PARSEME multilingual corpus annotated for multiword expressions (MWEs), resulting from efforts of the PARSEME community towards universality-driven modeling of idiomaticity. With respect to previous editions, we extend the annotation scope to all syntactic MWE categories: verbal, nominal, adjectival, adverbial and functional. We cover 17 languages, of which 7 are new. The annotation process is based on cross-lingually unified guidelines, phrased as decision diagrams over linguistic tests, and a typology of 18 MWE categories. The corpus contains almost 5 million tokens, over 250,000 sentences and 140,000 MWE annotations. The applicability of the corpus is tested in baseline experiments with a prompt-based MWE identification system. Results show that generic large language models do not encode sufficient knowledge to solve the MWE identification task.
Integrating TEI, NER/NEL, Textometry, and Linked Data for a Semantically Enriched Interview Corpus
Ranka Stankovic | Tamara Vučenović | Biljana Rujević | Milica Ikonić Nešić | Mihailo Škorić
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Ranka Stankovic | Tamara Vučenović | Biljana Rujević | Milica Ikonić Nešić | Mihailo Škorić
Proceedings of the Fifteenth Language Resources and Evaluation Conference
This paper presents a pipeline that converts unstructured interview transcripts into a semantically enriched, queryable knowledge resource. The texts from the Digitalne Ikone 20+ interview collection were first encoded in TEI XML (Text Encoding Initiative), marking interview boundaries, paragraph breaks, speaker turns with identifiers, dates, and topics. This structural encoding underpins downstream NLP and enables structured querying (e.g., by speaker). We then applied Named Entity Recognition to identify persons, places, organizations, and events, and embedded the results directly in TEI. In the third stage, Named Entity Linking mapped entity mentions to canonical Wikidata identifiers via context-aware disambiguation; missing entries were added to Wikidata when necessary. The resulting TEI+NER/NEL corpus, serialized as linked data, follows the NIF (NLP Interchange Framework). The pipeline also supports retrieval-augmented summarization that retrieves evidence passages and prompts LLMs (implemented with DSPy) to produce faithful interview summaries. We discuss design choices (TXM for textometry with JeRTeh resources; TESLA models for NER/NEL), report qualitative gains in interpretability through semantic links, and outline future work on domain-adapted NER/NEL, graph-based completion, and more expressive RAG architectures. The approach is replicable for other oral-history or media corpora and advances practical, evidence-grounded access to cultural archives and beyond.
Development of Serbian QA Datasets through Prompt-Based Generation and Human Validation
Jovana Rađenović | Olivera Kitanović | Ranka Stankovic | Mihailo Škorić
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Jovana Rađenović | Olivera Kitanović | Ranka Stankovic | Mihailo Škorić
Proceedings of the Fifteenth Language Resources and Evaluation Conference
LLMs capable of answering questions, fulfilling diverse user requests, and functioning as chatbots rely heavily on extensive datasets. However, for the Serbian language, there is a significant lack of high-quality datasets structured in a question-and-answer (QA) format. To address this, we extracted a portion of the SQuAD-sr dataset, which, to the best of our knowledge, is the largest QA dataset in Serbian and contains over 87k samples. While this dataset is an incredibly valuable resource, it was translated using an adapted Translate-Align-Retrieve method and contains errors and terminological inaccuracies. In this work, we systematically reviewed and corrected more than 7k samples from the SQuAD-sr dataset, significantly improving the dataset’s reliability and quality. We call this modified subset of the SQuAD-sr dataset, the SQuAD-sr-md dataset. The corrections that were made are crucial for training accurate and robust QA models in Serbian, ensuring that AI systems can leverage the full potential of this dataset. We also introduce an additional QA dataset generated from encyclopedia articles, Wikipedia pages, and scientific paper abstracts using LLMs, which contains 74k samples. We name this dataset the SerbianQA-Gen.
Search
Fix author
Co-authors
- Mihailo Škorić 2
- Verginica Barbu Mititelu 1
- Eric Bilinski 1
- Roberto Díaz Hernández 1
- Louis Estève 1
- Victoria Fendel 1
- Polona Gantar 1
- Voula Giouli 1
- Bruno Guillaume 1
- Milica Ikonić Nešić 1
- Olha Kanishcheva 1
- Olivera Kitanović 1
- Cvetana Krstev 1
- Chaya Liebeskind 1
- Irina Lobzhanidze 1
- Stella Markantonatou 1
- Aleksandra M. Marković 1
- Maria Mitrofan 1
- Takuya Nakamura 1
- Gunta Nešpore-Bērzkalne 1
- Aakanksha Padhye 1
- Adriana Silvina Pagano 1
- Vasile Pais 1
- Carlos Ramisch 1
- Jovana Rađenović 1
- Biljana Rujević 1
- Agata Savary 1
- Manon Scholivet 1
- Mehrnoush Shamsfard 1
- Sara Stymne 1
- Vahide Tajalli 1
- Carole Tiberius 1
- Tamara Vučenović 1
- Jaka Čibej 1
Venues
- LREC3