Euskorpora: A Strategic Framework for Digital Sovereignty and Linguistic Inclusion of Basque in the Era of AI

Victoria Arranz, Sara Arregi, Leire Barañano, Aitor García-Pablos


Abstract
Euskorpora is a pioneering initiative designed to establish a comprehensive digital infrastructure for the development of speech and language technologies in Basque. Built upon European, Spanish, and Basque strategies, it addresses the scarcity of linguistic data, foundational models, and technological resources for this non-Indo-European, low-resourced language. The project integrates large-scale data collection from public institutions and private organisations, creating extensive multimodal corpora that cover the linguistic, dialectal, and domain diversity of Basque. These resources support the training of open language models for speech, translation, and language understanding, as well as the establishment of an interoperable infrastructure aligned with European initiatives such as the European Language Data Space (LDS). By combining linguistic research, artificial intelligence, and data governance, Euskorpora ensures the digital sovereignty and inclusion of the Basque language within the global AI ecosystem. Beyond its regional focus, it stands as a transferable model for advancing linguistic diversity, technological innovation, and equitable digital transformation in multilingual Europe.
Anthology ID:
2026.lrec-main.93
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
1189–1196
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.93/
DOI:
Bibkey:
Cite (ACL):
Victoria Arranz, Sara Arregi, Leire Barañano, and Aitor García-Pablos. 2026. Euskorpora: A Strategic Framework for Digital Sovereignty and Linguistic Inclusion of Basque in the Era of AI. International Conference on Language Resources and Evaluation, main:1189–1196.
Cite (Informal):
Euskorpora: A Strategic Framework for Digital Sovereignty and Linguistic Inclusion of Basque in the Era of AI (Arranz et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.93.pdf