Diego Hernández-Bustamante


2026

This paper presents IIMAS-RAG, our system for SemEval-2026 Task 8 on evaluating multi-turn retrieval-augmented generation. Our approach combines LLM-based query rewriting, hybrid sparse-dense retrieval with SPLADE and Voyage-3-large fused via Reciprocal Rank Fusion, and answerability-conditioned generation with GPT-4.1. The system ranked 4th out of 38 teams in Subtask A (Retrieval) and 13th out of 29 teams in Subtask C (Full RAG). Our results show that query rewriting is the most impactful retrieval component, while generation remains challenging in low-context and partially answerable scenarios.

2025

We present MeSSI, a multi-module system applied to SemEval 2025’s task 3: Mu-SHROOM. Our system tags questions in order to obtain semantic relevant terms that are used as information retrieval characteristics. Said characteristics serve as extraction terms for Wikipedia pages that are in turn processed to generate gold standard texts used in a hallucination evaluation system. A PoST-based entity comparison was implemented to contrast the test dataset sentences with the corresponding generated gold standards, wich in turn was the main criteria to tag hallucinations, partitioned in soft labels and hard labels. This method was tested in Spanish and English, finishing 18th and 19th respectively on the IoU based ranking.
This paper describes Gradient Ascent and Task Vectors as LLM unlearning methodologies applied to SemEval 2025’s task 4. This task focuses on LLM unlearning on specific information under the constraints of preserving the model’s advanced text generation capabilities; meaning that our implementations of these algorithms were constrained both in the information datasets as well as the overall effect of each algorithm in the model’s general performance. Our implementation produced modified language models that ranked 7th out of 14 valid participants in the 7B parameter model, and 6th out of 24 in the 1B parameter model.