Dimosthenis Athanasiou

2026

AILS-NTUA at SemEval-2026 Task 8: Evaluating Multi-Turn RAG Conversations
Dimosthenis Athanasiou | Maria Lymperaiou | Giorgos Filandrianos | Athanasios Voulodimos | Giorgos Stamou
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

We describe the AILS-NTUA system for SemEval-2026 Task 8 (MTRAGEval), addressing all three subtasks of multi-turn retrieval-augmented generation: passage retrieval (A), reference-grounded response generation (B), and end-to-end RAG (C).Our approach is based on two main design principles. First, we adopt a query-diversity-over-retriever-diversity strategy, where multiple complementary LLM-based query reformulations are issued to a single corpus-aligned sparse retriever and combined using a variance-aware nested Reciprocal Rank Fusion scheme. Second, we employ an agentic generation pipeline that decomposes grounded response generation into evidence span extraction, dual-candidate drafting, and calibrated multi-judge selection.The proposed system achieves strong performance across subtasks, ranking first in Task A and second in Task B in the official evaluation. Our empirical findings indicate that query diversity over a well-aligned retriever is more effective than heterogeneous retriever ensembling, and that answerability calibration—rather than retrieval coverage—emerges as the primary bottleneck in end-to-end performance.

Co-authors

Venues

SemEval1
WS1

Fix author