Pratibha Revankar
2026
SlugRAG at SemEval-2026 Task 8: Domain-Specific Fine-Tuning and Model Scaling for Multi-Turn RAG Retrieval
Pratibha Revankar | Jihye Kim | Umit Azirakhmet
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Pratibha Revankar | Jihye Kim | Umit Azirakhmet
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Multi-Turn Retrieval-Augmented Generation (MT-RAG) requires resolving context-dependent ambiguities across conversational turns. We present a systematic evaluation of dense retrieval optimization for the MTRAGEval benchmark (Task 8, Subtask A: Retrieval Only), investigating training-time strategies and inference-time query reformulation across four diverse English-language domains: CLAPNQ (legal/patent), FIQA (financial), GOVT (government documents), and CLOUD (cloud computing). Our experiments demonstrate that domain-specific fine-tuning yields the most substantial gains, with our best CLAPNQ model achieving Recall@10 of 0.6016 and nDCG@10 of 0.4981—representing 58.3\% and 66.0\% improvements over the pre-trained BGE baseline. Domain-specific models average 44.3\% improvement in Recall@10 and 47.8\% in nDCG@10 across all domains. Additionally, fine-tuning larger embedding models (BGE-large) achieves the best overall performance (nDCG@10: 0.5101, Recall@10: 0.6221), highlighting the complementary impact of model capacity and domain adaptation.