EuskañolDS: A Naturally Sourced Corpus for Basque-Spanish Code-Switching

Maite Heredia, Jeremy Barnes, Aitor Soroa


Abstract
Code-switching (CS) remains a significant challenge in Natural Language Processing (NLP), mainly due a lack of relevant data. In the context of the contact between the Basque and Spanish languages in the north of the Iberian Peninsula, CS frequently occurs in both formal and informal spontaneous interactions. However, resources to analyse this phenomenon and support the development and evaluation of models capable of understanding and generating code-switched language for this language pair are almost non-existent. We introduce a first approach to develop a naturally sourced corpus for Basque-Spanish code-switching. Our methodology consists of identifying CS texts from previously available corpora using language identification models, which are then manually validated to obtain a reliable subset of CS instances. We present the properties of our corpus and make it available under the name EuskañolDS.
Anthology ID:
2025.calcs-1.1
Volume:
Proceedings of the 7th Workshop on Computational Approaches to Linguistic Code-Switching
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico, USA
Editors:
Genta Indra Winata, Sudipta Kar, Marina Zhukova, Thamar Solorio, Xi Ai, Injy Hamed, Mahardika Krisna Krisna Ihsani, Derry Tanti Wijaya, Garry Kuwanto
Venues:
CALCS | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–5
Language:
URL:
https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.calcs-1.1/
DOI:
Bibkey:
Cite (ACL):
Maite Heredia, Jeremy Barnes, and Aitor Soroa. 2025. EuskañolDS: A Naturally Sourced Corpus for Basque-Spanish Code-Switching. In Proceedings of the 7th Workshop on Computational Approaches to Linguistic Code-Switching, pages 1–5, Albuquerque, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
EuskañolDS: A Naturally Sourced Corpus for Basque-Spanish Code-Switching (Heredia et al., CALCS 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.calcs-1.1.pdf