From Words to Action: A National Initiative to Overcome Data Scarcity for the Slovene LLM

Špela Arhar Holdt, Špela Antloga, Tina Munda, Eva Pori, Simon Krek


Abstract
Large Language Models (LLMs) have demonstrated significant potential in natural language processing, but they depend on vast, diverse datasets, creating challenges for languages with limited resources. The paper presents a national initiative that addresses these challenges for Slovene. We outline strategies for large-scale text collection, including the creation of an online platform to engage the broader public in contributing texts and a communication campaign promoting openly accessible and transparently developed LLMs.
Anthology ID:
2025.resourceful-1.27
Volume:
Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)
Month:
March
Year:
2025
Address:
Tallinn, Estonia
Editors:
Špela Arhar Holdt, Nikolai Ilinykh, Barbara Scalvini, Micaella Bruton, Iben Nyholm Debess, Crina Madalina Tudor
Venues:
RESOURCEFUL | WS
SIG:
Publisher:
University of Tartu Library, Estonia
Note:
Pages:
130–136
Language:
URL:
https://preview.aclanthology.org/moar-dois/2025.resourceful-1.27/
DOI:
Bibkey:
Cite (ACL):
Špela Arhar Holdt, Špela Antloga, Tina Munda, Eva Pori, and Simon Krek. 2025. From Words to Action: A National Initiative to Overcome Data Scarcity for the Slovene LLM. In Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025), pages 130–136, Tallinn, Estonia. University of Tartu Library, Estonia.
Cite (Informal):
From Words to Action: A National Initiative to Overcome Data Scarcity for the Slovene LLM (Holdt et al., RESOURCEFUL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/moar-dois/2025.resourceful-1.27.pdf