Josu Goikoetxea


2025

pdf bib
ArgHiTZ at ArchEHR-QA 2025: A Two-Step Divide and Conquer Approach to Patient Question Answering for Top Factuality
Adrian Cuadron Cortes | Aimar Sagasti | Maitane Urruela | Iker De La Iglesia | Ane García Domingo-aldama | Aitziber Atutxa Salazar | Josu Goikoetxea | Ander Barrena
Proceedings of the 24th Workshop on Biomedical Language Processing (Shared Tasks)

This work presents three different approaches to address the ArchEHR-QA 2025 Shared Task on automated patient question answering. We introduce an end-to-end prompt-based baseline and two two-step methods to divide the task, without utilizing any external knowledge. Both two step approaches first extract essential sentences from the clinical text—by prompt or similarity ranking—, and then generate the final answer from these notes. Results indicate that the re-ranker based two-step system performs best, highlighting the importance of selecting the right approach for each subtask. Our best run achieved an overall score of 0.44, ranking 8th out of 30 on the leaderboard, securing the top position in overall factuality.

pdf bib
Ranking Over Scoring: Towards Reliable and Robust Automated Evaluation of LLM-Generated Medical Explanatory Arguments
Iker De la Iglesia | Iakes Goenaga | Johanna Ramirez-Romero | Jose Maria Villa-Gonzalez | Josu Goikoetxea | Ander Barrena
Proceedings of the 31st International Conference on Computational Linguistics

Evaluating LLM-generated text has become a key challenge, especially in domain-specific contexts like the medical field. This work introduces a novel evaluation methodology for LLM-generated medical explanatory arguments, relying on Proxy Tasks and rankings to closely align results with human evaluation criteria, overcoming the biases typically seen in LLMs used as judges. We demonstrate that the proposed evaluators are robust against adversarial attacks, including the assessment of non-argumentative text. Additionally, the human-crafted arguments needed to train the evaluators are minimized to just one example per Proxy Task. By examining multiple LLM-generated arguments, we establish a methodology for determining whether a Proxy Task is suitable for evaluating LLM-generated medical explanatory arguments, requiring only five examples and two human experts.

2015

pdf bib
Random Walks and Neural Network Language Models on Knowledge Bases
Josu Goikoetxea | Aitor Soroa | Eneko Agirre
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
Exploring the use of word embeddings and random walks on Wikipedia for the CogAlex shared task
Josu Goikoetxea | Eneko Agirre | Aitor Soroa
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)