Developing Zila: A Spoken Language Resource for the Endangered Slovenian Gail Valley Dialect

Andrej Zgank, Gregor Donaj, Urh Kolaric, Usi Sereinig, Tatjana Koren-Zwitter, Sanja Boto, Sabina Zwitter-Grilc, Jasna Vidinic, Darinka Verdonik


Abstract
Slovenian is a less-resourced South Slavic language. Existing Slovenian spoken language resources mainly cover the standard language in everyday communication. However, Slovenian encompasses a wide range of dialects, most of which are not represented in available spoken language resources. This paper presents the development of Zila, a Slovenian spoken language resource for the Gail Valley dialect. This dialect is one of the most endangered varieties of Slovenian and is spoken in the extreme north-western periphery of the Slovenian language area. The goal of the project was to build a language resource comprising 100 hours of speech with manually produced transcriptions. The spoken material was collected from members of the Slovenian minority in Carinthia, Austria, with the local community playing a key role in the data acquisition process. A dedicated set of transcription rules was created to capture the full range of acoustic and linguistic features of the Gail Valley dialect, which differs significantly from standard Slovenian. A preliminary speech recognition experiment was conducted to analyze these differences further. The Zila project demonstrates how spoken language technologies can help to preserve the cultural and linguistic heritage of an endangered dialect.
Anthology ID:
2026.lrec-main.262
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
3325–3332
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.262/
DOI:
Bibkey:
Cite (ACL):
Andrej Zgank, Gregor Donaj, Urh Kolaric, Usi Sereinig, Tatjana Koren-Zwitter, Sanja Boto, Sabina Zwitter-Grilc, Jasna Vidinic, and Darinka Verdonik. 2026. Developing Zila: A Spoken Language Resource for the Endangered Slovenian Gail Valley Dialect. International Conference on Language Resources and Evaluation, main:3325–3332.
Cite (Informal):
Developing Zila: A Spoken Language Resource for the Endangered Slovenian Gail Valley Dialect (Zgank et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.262.pdf