Yoan Gutierrez

Other people with similar names: Yoan Gutiérrez


2025

pdf bib
XAutoLM: Efficient Fine-Tuning of Language Models via Meta-Learning and AutoML
Ernesto Luis Estevanell Valladares | Suilan Estevez-Velarde | Yoan Gutierrez | Andrés Montoyo | Ruslan Mitkov
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Experts in machine learning leverage domain knowledge to navigate decisions in model selection, hyperparameter optimization, and resource allocation. This is particularly critical for fine-tuning language models (LMs), where repeated trials incur substantial computational overhead and environmental impact. However, no existing automated framework simultaneously tackles the entire model selection and hyperparameter optimization (HPO) task for resource-efficient LM fine-tuning. We introduce XAutoLM, a meta-learning-augmented AutoML framework that reuses past experiences to optimize discriminative and generative LM fine-tuning pipelines efficiently. XAutoLM learns from stored successes and failures by extracting task- and system-level meta-features to bias its sampling toward valuable configurations and away from costly dead ends. On four text classification and two question-answering benchmarks, XAutoLM surpasses zero-shot optimizer’s peak F1 on five of six tasks, cuts mean evaluation time of pipelines by up to 4.5x, reduces search error ratios by up to sevenfold, and uncovers up to 50% more pipelines above the zero-shot Pareto front. In contrast, simpler memory-based baselines suffer negative transfer. We release XAutoLM and our experience store to catalyze resource-efficient, Green AI fine-tuning in the NLP community.

pdf bib
Leveraging Large Language Models to Measure Gender Representation Bias in Gendered Language Corpora
Erik Derner | Sara Sansalvador De La Fuente | Yoan Gutierrez | Paloma Moreda Pozo | Nuria M Oliver
Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

Large language models (LLMs) often inherit and amplify social biases embedded in their training data. A prominent social bias is gender bias. In this regard, prior work has mainly focused on gender stereotyping bias – the association of specific roles or traits with a particular gender – in English and on evaluating gender bias in model embeddings or generated outputs. In contrast, gender representation bias – the unequal frequency of references to individuals of different genders – in the training corpora has received less attention. Yet such imbalances in the training data constitute an upstream source of bias that can propagate and intensify throughout the entire model lifecycle. To fill this gap, we propose a novel LLM-based method to detect and quantify gender representation bias in LLM training data in gendered languages, where grammatical gender challenges the applicability of methods developed for English. By leveraging the LLMs’ contextual understanding, our approach automatically identifies and classifies person-referencing words in gendered language corpora. Applied to four Spanish-English benchmarks and five Valencian corpora, our method reveals substantial male-dominant imbalances. We show that such biases in training data affect model outputs, but can surprisingly be mitigated leveraging small-scale training on datasets that are biased towards the opposite gender. Our findings highlight the need for corpus-level gender bias analysis in multilingual NLP. We make our code and data publicly available.

2024

pdf bib
Educational Material to Knowledge Graph Conversion: A Methodology to Enhance Digital Education
Miquel Canal-Esteve | Yoan Gutierrez
Proceedings of the 1st Workshop on Knowledge Graphs and Large Language Models (KaLLM 2024)

This article argues that digital educational content should be structured as knowledge graphs (KGs). Unlike traditional repositories such as Moodle, a KG offers a more flexible representation of the relationships between concepts, facilitating intuitive navigation and discovery of connections. In addition, it integrates effectively with Large Language Models, enhancing personalized explanations, answers, and recommendations. This article studies different proposals based on semantics and knowledge modelling to determine the most appropriate ways to strengthen intelligent educational technologies.