AILS-NTUA at SemEval-2026 Task 8: Evaluating Multi-Turn RAG Conversations

Dimosthenis Athanasiou, Maria Lymperaiou, Giorgos Filandrianos, Athanasios Voulodimos, Giorgos Stamou


Abstract
We describe the AILS-NTUA system for SemEval-2026 Task 8 (MTRAGEval), addressing all three subtasks of multi-turn retrieval-augmented generation: passage retrieval (A), reference-grounded response generation (B), and end-to-end RAG (C).Our approach is based on two main design principles. First, we adopt a query-diversity-over-retriever-diversity strategy, where multiple complementary LLM-based query reformulations are issued to a single corpus-aligned sparse retriever and combined using a variance-aware nested Reciprocal Rank Fusion scheme. Second, we employ an agentic generation pipeline that decomposes grounded response generation into evidence span extraction, dual-candidate drafting, and calibrated multi-judge selection.The proposed system achieves strong performance across subtasks, ranking first in Task A and second in Task B in the official evaluation. Our empirical findings indicate that query diversity over a well-aligned retriever is more effective than heterogeneous retriever ensembling, and that answerability calibration—rather than retrieval coverage—emerges as the primary bottleneck in end-to-end performance.
Anthology ID:
2026.semeval-1.175
Volume:
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:
SemEval | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1340–1365
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.175/
DOI:
Bibkey:
Cite (ACL):
Dimosthenis Athanasiou, Maria Lymperaiou, Giorgos Filandrianos, Athanasios Voulodimos, and Giorgos Stamou. 2026. AILS-NTUA at SemEval-2026 Task 8: Evaluating Multi-Turn RAG Conversations. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 1340–1365, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
AILS-NTUA at SemEval-2026 Task 8: Evaluating Multi-Turn RAG Conversations (Athanasiou et al., SemEval 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.175.pdf