Diego Hernández-Bustamante
2026
IIMAS-RAG at SemEval-2026 Task 8: Hybrid Sparse-Dense Retrieval and Answerability-Conditioned Generation for Multi-Turn RAG
Vania Raya-Rios | Helena Gomez-Adorno | Leon Hecht | Pedro Vázquez-Osorio | Erick Fabián-Sandoval | Jesús Vázquez-Osorio | Diego Hernández-Bustamante
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Vania Raya-Rios | Helena Gomez-Adorno | Leon Hecht | Pedro Vázquez-Osorio | Erick Fabián-Sandoval | Jesús Vázquez-Osorio | Diego Hernández-Bustamante
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper presents IIMAS-RAG, our system for SemEval-2026 Task 8 on evaluating multi-turn retrieval-augmented generation. Our approach combines LLM-based query rewriting, hybrid sparse-dense retrieval with SPLADE and Voyage-3-large fused via Reciprocal Rank Fusion, and answerability-conditioned generation with GPT-4.1. The system ranked 4th out of 38 teams in Subtask A (Retrieval) and 13th out of 29 teams in Subtask C (Full RAG). Our results show that query rewriting is the most impactful retrieval component, while generation remains challenging in low-context and partially answerable scenarios.
2025
GIL-IIMAS UNAM at SemEval-2025 Task 3: MeSSI: A Multilmodule System to detect hallucinated Segments in trivia-like Inquiries.
Francisco López-Ponce | Karla Salas-Jimenez | Adrián Juárez-Pérez | Diego Hernández-Bustamante | Gemma Bel-Enguix | Helena Gómez-Adorno
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Francisco López-Ponce | Karla Salas-Jimenez | Adrián Juárez-Pérez | Diego Hernández-Bustamante | Gemma Bel-Enguix | Helena Gómez-Adorno
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
We present MeSSI, a multi-module system applied to SemEval 2025’s task 3: Mu-SHROOM. Our system tags questions in order to obtain semantic relevant terms that are used as information retrieval characteristics. Said characteristics serve as extraction terms for Wikipedia pages that are in turn processed to generate gold standard texts used in a hallucination evaluation system. A PoST-based entity comparison was implemented to contrast the test dataset sentences with the corresponding generated gold standards, wich in turn was the main criteria to tag hallucinations, partitioned in soft labels and hard labels. This method was tested in Spanish and English, finishing 18th and 19th respectively on the IoU based ranking.
GIL-IIMAS UNAM at SemEval-2025 Task 4: LA-Min(E): LLM Unlearning Approaches Under Function Minimizing Evaluation Constraints
Karla Salas-Jimenez | Francisco López-Ponce | Diego Hernández-Bustamante | Gemma Bel-Enguix | Helena Gómez-Adorno
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Karla Salas-Jimenez | Francisco López-Ponce | Diego Hernández-Bustamante | Gemma Bel-Enguix | Helena Gómez-Adorno
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
This paper describes Gradient Ascent and Task Vectors as LLM unlearning methodologies applied to SemEval 2025’s task 4. This task focuses on LLM unlearning on specific information under the constraints of preserving the model’s advanced text generation capabilities; meaning that our implementations of these algorithms were constrained both in the information datasets as well as the overall effect of each algorithm in the model’s general performance. Our implementation produced modified language models that ranked 7th out of 14 valid participants in the 7B parameter model, and 6th out of 24 in the 1B parameter model.