AILS-NTUA at SemEval-2026 Task 8: Evaluating Multi-Turn RAG Conversations

Dimosthenis Athanasiou; Maria Lymperaiou; Giorgos Filandrianos; Athanasios Voulodimos; Giorgos Stamou

AILS-NTUA at SemEval-2026 Task 8: Evaluating Multi-Turn RAG Conversations

Dimosthenis Athanasiou, Maria Lymperaiou, Giorgos Filandrianos, Athanasios Voulodimos, Giorgos Stamou

Abstract

We describe the AILS-NTUA system for SemEval-2026 Task 8 (MTRAGEval), addressing all three subtasks of multi-turn retrieval-augmented generation: passage retrieval (A), reference-grounded response generation (B), and end-to-end RAG (C).Our approach is based on two main design principles. First, we adopt a query-diversity-over-retriever-diversity strategy, where multiple complementary LLM-based query reformulations are issued to a single corpus-aligned sparse retriever and combined using a variance-aware nested Reciprocal Rank Fusion scheme. Second, we employ an agentic generation pipeline that decomposes grounded response generation into evidence span extraction, dual-candidate drafting, and calibrated multi-judge selection.The proposed system achieves strong performance across subtasks, ranking first in Task A and second in Task B in the official evaluation. Our empirical findings indicate that query diversity over a well-aligned retriever is more effective than heterogeneous retriever ensembling, and that answerability calibration—rather than retrieval coverage—emerges as the primary bottleneck in end-to-end performance.

Anthology ID:: 2026.semeval-1.175
Volume:: Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1340–1365
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.175/
DOI:
Bibkey:
Cite (ACL):: Dimosthenis Athanasiou, Maria Lymperaiou, Giorgos Filandrianos, Athanasios Voulodimos, and Giorgos Stamou. 2026. AILS-NTUA at SemEval-2026 Task 8: Evaluating Multi-Turn RAG Conversations. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 1340–1365, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: AILS-NTUA at SemEval-2026 Task 8: Evaluating Multi-Turn RAG Conversations (Athanasiou et al., SemEval 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.175.pdf

PDF Cite Search Fix data