Inez Wihardjo


2026

We describe our system for SemEval-2026 Task 8 (MTRAGEval) on multi-turn conversational RAG. Our approach combines hybrid retrieval (fusing SPLADE-v3 learned sparse representations with dense embeddings via Reciprocal Rank Fusion) with a fine-tuned cross-encoder reranker and zero-shot LLM generation using Claude Opus 4.5. We systematically evaluate 56 retrieval configurations across 4 domains, and 5 generation strategies across 5 LLMs. Our findings show that: (1) SPLADE-v3 with dataset rewrites substantially outperforms BM25 across all configurations, (2) simple zero-shot prompting matches sophisticated strategies like Self-RAG and CRAG, and (3) performance varies significantly by answerability class. On the test set, we achieve rank 5/29 on Task C (end-to-end RAG, H=0.5564), rank 7/26 on Task B (generation, H=0.7495), and rank 13/38 on Task A (retrieval, nDCG@5=0.4782). Our analysis reveals strong performance on answerable queries (H=0.685) but degradation on underspecified queries (H=0.254).