MindFlayer at SemEval-2026 Task 8:DUALRAG:Answerability-Aware Generation for Multi-Turn RAG Conversations

Jerin Romijah Tuli; Md. Sartaj Alam Pritom; Talukder Naemul Hasan Naem

MindFlayer at SemEval-2026 Task 8:DUALRAG:Answerability-Aware Generation for Multi-Turn RAG Conversations

Jerin Romijah Tuli, Md. Sartaj Alam Pritom, Talukder Naemul Hasan Naem

Abstract

Our system, DualRAG (team MindFlayer), tackles SemEval-2026 Task 8 Subtask B - generating faithful responses in multi-turn RAG conversations. The core idea is simple: before generating anything, we first check whether reference passages exist for the current question. If they do, we route through a domain-guided generation prompt that instructs the model to answer using only those passages. If they do not, we route through a strict refusal prompt that tells the model to politely decline rather than guess.We used Meta’s Llama-4-Scout-17B through the Groq API, with no training or fine-tuning - purely zero-shot prompting. A lightweight post-processing layer catches the rare cases where the model ignores its instructions: if it refuses when passages are available, we replace the response with a neutral fallback; if it answers when no passages exist, we replace it with a standard refusal. Out of 507 test tasks, only 7 needed this correction.The system ranked 8th out of 26 teams with a harmonic mean of 0.7492, beating the strongest baseline (GPT-OSS-120B at 0.639) by a notable margin. The standout result is 100% refusal accuracy on all 130 unanswerable questions - something even GPT-4o and Llama 3.1 405B failed to achieve consistently according to prior work. Our RLF score of 0.8782 shows the responses stay tightly grounded in the reference passages. The relatively lower RBagg (0.6024) reflects the challenge of matching human-written phrasing in a zero-shot setting, which we identify as the clearest direction for improvement.

Anthology ID:: 2026.semeval-1.293
Volume:: Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2314–2321
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.293/
DOI:
Bibkey:
Cite (ACL):: Jerin Romijah Tuli, Md. Sartaj Alam Pritom, and Talukder Naemul Hasan Naem. 2026. MindFlayer at SemEval-2026 Task 8:DUALRAG:Answerability-Aware Generation for Multi-Turn RAG Conversations. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 2314–2321, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: MindFlayer at SemEval-2026 Task 8:DUALRAG:Answerability-Aware Generation for Multi-Turn RAG Conversations (Tuli et al., SemEval 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.293.pdf
Supplementarymaterial:: 2026.semeval-1.293.SupplementaryMaterial.zip

PDF Cite Search Supplementarymaterial Fix data