From Words to Action: A National Initiative to Overcome Data Scarcity for the Slovene LLM
Špela Arhar Holdt, Špela Antloga, Tina Munda, Eva Pori, Simon Krek
Abstract
Large Language Models (LLMs) have demonstrated significant potential in natural language processing, but they depend on vast, diverse datasets, creating challenges for languages with limited resources. The paper presents a national initiative that addresses these challenges for Slovene. We outline strategies for large-scale text collection, including the creation of an online platform to engage the broader public in contributing texts and a communication campaign promoting openly accessible and transparently developed LLMs.- Anthology ID:
- 2025.resourceful-1.27
- Volume:
- Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)
- Month:
- March
- Year:
- 2025
- Address:
- Tallinn, Estonia
- Editors:
- Špela Arhar Holdt, Nikolai Ilinykh, Barbara Scalvini, Micaella Bruton, Iben Nyholm Debess, Crina Madalina Tudor
- Venues:
- RESOURCEFUL | WS
- SIG:
- Publisher:
- University of Tartu Library, Estonia
- Note:
- Pages:
- 130–136
- Language:
- URL:
- https://preview.aclanthology.org/moar-dois/2025.resourceful-1.27/
- DOI:
- Cite (ACL):
- Špela Arhar Holdt, Špela Antloga, Tina Munda, Eva Pori, and Simon Krek. 2025. From Words to Action: A National Initiative to Overcome Data Scarcity for the Slovene LLM. In Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025), pages 130–136, Tallinn, Estonia. University of Tartu Library, Estonia.
- Cite (Informal):
- From Words to Action: A National Initiative to Overcome Data Scarcity for the Slovene LLM (Holdt et al., RESOURCEFUL 2025)
- PDF:
- https://preview.aclanthology.org/moar-dois/2025.resourceful-1.27.pdf