Biljana Rujević

2026

Integrating TEI, NER/NEL, Textometry, and Linked Data for a Semantically Enriched Interview Corpus
Ranka Stankovic | Tamara Vučenović | Biljana Rujević | Milica Ikonić Nešić | Mihailo Škorić
Proceedings of the Fifteenth Language Resources and Evaluation Conference

This paper presents a pipeline that converts unstructured interview transcripts into a semantically enriched, queryable knowledge resource. The texts from the Digitalne Ikone 20+ interview collection were first encoded in TEI XML (Text Encoding Initiative), marking interview boundaries, paragraph breaks, speaker turns with identifiers, dates, and topics. This structural encoding underpins downstream NLP and enables structured querying (e.g., by speaker). We then applied Named Entity Recognition to identify persons, places, organizations, and events, and embedded the results directly in TEI. In the third stage, Named Entity Linking mapped entity mentions to canonical Wikidata identifiers via context-aware disambiguation; missing entries were added to Wikidata when necessary. The resulting TEI+NER/NEL corpus, serialized as linked data, follows the NIF (NLP Interchange Framework). The pipeline also supports retrieval-augmented summarization that retrieves evidence passages and prompts LLMs (implemented with DSPy) to produce faithful interview summaries. We discuss design choices (TXM for textometry with JeRTeh resources; TESLA models for NER/NEL), report qualitative gains in interpretability through semantic links, and outline future work on domain-adapted NER/NEL, graph-based completion, and more expressive RAG architectures. The approach is replicable for other oral-history or media corpora and advances practical, evidence-grounded access to cultural archives and beyond.

2024

pdf bib abs

Advancing Sentiment Analysis in Serbian Literature: A Zero and Few–Shot Learning Approach Using the Mistral Model
Milica Ikonić Nešić | Saša Petalinkar | Mihailo Škorić | Ranka Stanković | Biljana Rujević
Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024)

This study presents the Sentiment Analysis of the Serbian old novels from the 1840-1920 period, employing the Mistral Large Language Model (LLM) to pioneer zero and few-shot learning techniques. The main approach innovates by devising research prompts that include guidance text for zero-shot classification and examples for few-shot learning, enabling the LLM to classify sentiments into positive, negative, or objective categories. This methodology aims to streamline sentiment analysis by limiting responses, thereby enhancing classification precision. Python, along with the Hugging Face Transformers and LangChain libraries, serves as our technological backbone, facilitating the creation and refinement of research prompts tailored for sentence-level sentiment analysis. The results of sentiment analysis in both scenarios, zero-shot and few-shot, have indicated that the zero-shot approach outperforms, achieving an accuracy of 68.2%.