Ainhoa Vivel-Couso

Also published as: Ainhoa Vivel Couso

2026

MeteoGalEus: An Iberian Multilingual Weather Dataset in Galician, Euskera, and Spanish
Ainhoa Vivel-Couso | Nella Zabrina Pramata | David Robredo | Aitor Soroa | Jose Maria Alonso-Moral
Proceedings of the Fifteenth Language Resources and Evaluation Conference

This paper introduces MeteoGalEus, a multilingual weather dataset that combines meteorological observations from two Spanish regional agencies, Euskalmet and MeteoGalicia. The dataset contains daily records spanning 4 years and 6 months, with aligned observations for both sources. MeteoGalEus captures key meteorological variables including temperature, wind and state of the sky. The dataset is provided in a structured format, facilitating data analysis and integration, with textual forecasts available in the official languages for each region (i.e., Galician and Spanish for MeteoGalicia; Euskera and Spanish for Euskalmet). By merging and harmonizing data from two regional agencies, MeteoGalEus is a unique resource for cross-regional weather analysis and multilingual climate studies. This dataset is suited for tasks requiring high-quality, aligned, and standardized weather data across multiple languages and regions. We conducted baseline experiments using LLaMA-based models in both zero-shot and fine-tuned settings to illustrate the use of MeteoGalEus for natural language generation (NLG). Fine-tuning led to consistent improvements across all metrics, with BERTScore increasing from 0.68 to 0.79, ROUGE from 0.20 to 0.35, and BLEU from 0.02 to 0.17 in the best-performing model. The experiments show how MeteoGalEus can be taken as a benchmark for multilingual and cross-regional NLG tasks.

2024

pdf bib abs

ReproHum #0927-3: Reproducing The Human Evaluation Of The DExperts Controlled Text Generation Method
Javier González Corbelle | Ainhoa Vivel Couso | Jose Maria Alonso-Moral | Alberto Bugarín-Diz
Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024

This paper presents a reproduction study aimed at reproducing and validating a human NLP evaluation performed for the DExperts text generation method. The original study introduces DExperts, a controlled text generation method, evaluated using non-toxic prompts from the RealToxicityPrompts dataset. Our reproduction study aims to reproduce the human evaluation of the continuations generated by DExperts in comparison with four baseline methods, in terms of toxicity, topicality, and fluency. We first describe the agreed approach for reproduction within the ReproHum project and detail the configuration of the original evaluation, including necessary adaptations for reproduction. Then, we make a comparison of our reproduction results with those reported in the reproduced paper. Interestingly, we observe how the human evaluators in our experiment appreciate higher quality in the texts generated by DExperts in terms of less toxicity and better fluency. All in all, new scores are higher, also for the baseline methods. This study contributes to ongoing efforts in ensuring the reproducibility and reliability of findings in NLP evaluation and emphasizes the critical role of robust methodologies in advancing the field.

Co-authors

Aitor Soroa 1

Venues

Fix author