AILS-NTUA at SemEval-2026 Task 8: Evaluating Multi-Turn RAG Conversations
Dimosthenis Athanasiou, Maria Lymperaiou, Giorgos Filandrianos, Athanasios Voulodimos, Giorgos Stamou
Abstract
We describe the AILS-NTUA system for SemEval-2026 Task 8 (MTRAGEval), addressing all three subtasks of multi-turn retrieval-augmented generation: passage retrieval (A), reference-grounded response generation (B), and end-to-end RAG (C).Our approach is based on two main design principles. First, we adopt a query-diversity-over-retriever-diversity strategy, where multiple complementary LLM-based query reformulations are issued to a single corpus-aligned sparse retriever and combined using a variance-aware nested Reciprocal Rank Fusion scheme. Second, we employ an agentic generation pipeline that decomposes grounded response generation into evidence span extraction, dual-candidate drafting, and calibrated multi-judge selection.The proposed system achieves strong performance across subtasks, ranking first in Task A and second in Task B in the official evaluation. Our empirical findings indicate that query diversity over a well-aligned retriever is more effective than heterogeneous retriever ensembling, and that answerability calibration—rather than retrieval coverage—emerges as the primary bottleneck in end-to-end performance.- Anthology ID:
- 2026.semeval-1.175
- Volume:
- Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, USA
- Editors:
- Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
- Venues:
- SemEval | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1340–1365
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.175/
- DOI:
- Cite (ACL):
- Dimosthenis Athanasiou, Maria Lymperaiou, Giorgos Filandrianos, Athanasios Voulodimos, and Giorgos Stamou. 2026. AILS-NTUA at SemEval-2026 Task 8: Evaluating Multi-Turn RAG Conversations. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 1340–1365, San Diego, California, USA. Association for Computational Linguistics.
- Cite (Informal):
- AILS-NTUA at SemEval-2026 Task 8: Evaluating Multi-Turn RAG Conversations (Athanasiou et al., SemEval 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.175.pdf