Laura García-Sardiña


2022

pdf
BaSCo: An Annotated Basque-Spanish Code-Switching Corpus for Natural Language Understanding
Maia Aguirre | Laura García-Sardiña | Manex Serras | Ariane Méndez | Jacobo López
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The main objective of this work is the elaboration and public release of BaSCo, the first corpus with annotated linguistic resources encompassing Basque-Spanish code-switching. The mixture of Basque and Spanish languages within the same utterance is popularly referred to as Euskañol, a widespread phenomenon among bilingual speakers in the Basque Country. Thus, this corpus has been created to meet the demand of annotated linguistic resources in Euskañol in research areas such as multilingual dialogue systems. The presented resource is the result of translating to Euskañol a compilation of texts in Basque and Spanish that were used for training the Natural Language Understanding (NLU) models of several task-oriented bilingual chatbots. Those chatbots were meant to answer specific questions associated with the administration, fiscal, and transport domains. In addition, they had the transverse potential to answer to greetings, requests for help, and chit-chat questions asked to chatbots. BaSCo is a compendium of 1377 tagged utterances with every sample annotated at three levels: (i) NLU semantic labels, considering intents and entities, (ii) code-switching proportion, and (iii) domain of origin.

2020

pdf
HitzalMed: Anonymisation of Clinical Text in Spanish
Salvador Lima Lopez | Naiara Perez | Laura García-Sardiña | Montse Cuadros
Proceedings of the Twelfth Language Resources and Evaluation Conference

HitzalMed is a web-framed tool that performs automatic detection of sensitive information in clinical texts using machine learning algorithms reported to be competitive for the task. Moreover, once sensitive information is detected, different anonymisation techniques are implemented that are configurable by the user –for instance, substitution, where sensitive items are replaced by same category text in an effort to generate a new document that looks as natural as the original one. The tool is able to get data from different document formats and outputs downloadable anonymised data. This paper presents the anonymisation and substitution technology and the demonstrator which is publicly available at https://snlt.vicomtech.org/hitzalmed.

2018

pdf
ES-Port: a Spontaneous Spoken Human-Human Technical Support Corpus for Dialogue Research in Spanish
Laura García-Sardiña | Manex Serras | Arantza del Pozo
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)