Eva Pori
2025
From Words to Action: A National Initiative to Overcome Data Scarcity for the Slovene LLM
Špela Arhar Holdt
|
Špela Antloga
|
Tina Munda
|
Eva Pori
|
Simon Krek
Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)
Large Language Models (LLMs) have demonstrated significant potential in natural language processing, but they depend on vast, diverse datasets, creating challenges for languages with limited resources. The paper presents a national initiative that addresses these challenges for Slovene. We outline strategies for large-scale text collection, including the creation of an online platform to engage the broader public in contributing texts and a communication campaign promoting openly accessible and transparently developed LLMs.