Eva Pori


2025

pdf bib
From Words to Action: A National Initiative to Overcome Data Scarcity for the Slovene LLM
Špela Arhar Holdt | Špela Antloga | Tina Munda | Eva Pori | Simon Krek
Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)

Large Language Models (LLMs) have demonstrated significant potential in natural language processing, but they depend on vast, diverse datasets, creating challenges for languages with limited resources. The paper presents a national initiative that addresses these challenges for Slovene. We outline strategies for large-scale text collection, including the creation of an online platform to engage the broader public in contributing texts and a communication campaign promoting openly accessible and transparently developed LLMs.