ROG: A Multi-Layer Manually Annotated Corpus of Spoken Slovenian
Kaja Dobrovoljc Zor, Darinka Verdonik, Jaka Čibej, Peter Rupnik, Nikola Ljubešić
Abstract
We present ROG, the first manually annotated spoken corpus of Slovenian to integrate morphosyntactic, prosodic, and interactional layers in a unified framework. Building on the pre-existing Spoken Slovenian Treebank (SST) and newly available recordings from the GOS 2 reference corpus, the resource combines over 75,000 words (10 hours) of annotated speech. The entire corpus features lemmatization, MULTEXT-East morphosyntax, and Universal Dependencies annotations, while approximately half includes additional layers for prosodic units, disfluencies, and dialogue acts. All annotation layers are systematically aligned and cross-referenced, enabling detailed multi-dimensional analyses of spoken language. We describe the corpus design, annotation workflow, data release, and baseline modeling results, showcasing the resource’s value for both linguistic analysis and speech-aware NLP model development. All ROG transcriptions and annotations, along with half of the audio recordings, are freely available under CC-BY via (anonymized) repository.- Anthology ID:
- 2026.lrec-main.449
- Volume:
- Proceedings of the Fifteenth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2026
- Address:
- Palma de Mallorca, Spain
- Editors:
- Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
- Venue:
- LREC
- SIG:
- Publisher:
- ELRA Language Resource Association
- Note:
- Pages:
- 5701–5710
- Language:
- URL:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.449/
- DOI:
- Cite (ACL):
- Kaja Dobrovoljc Zor, Darinka Verdonik, Jaka Čibej, Peter Rupnik, and Nikola Ljubešić. 2026. ROG: A Multi-Layer Manually Annotated Corpus of Spoken Slovenian. International Conference on Language Resources and Evaluation, main:5701–5710.
- Cite (Informal):
- ROG: A Multi-Layer Manually Annotated Corpus of Spoken Slovenian (Dobrovoljc Zor et al., LREC 2026)
- PDF:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.449.pdf