Tackling Distractor Documents in Multi-Hop QA with Reinforcement and Curriculum Learning

Jerry Huang, Siddarth Madala, Risham Sidhu, Cheng Niu, Hao Peng, Julia Hockenmaier, Tong Zhang


Abstract
Retrieval-augmented generation (RAG) systems rely on retrieval models for identifying relevant contexts and answer generation models for utilizing those contexts. However, retrievers exhibit imperfect recall and precision, limiting downstream performance. We introduce RAG-RL, an answer generation model trained for multi-hop question answering (MHQA) to not only generate answers but also to identify and cite relevant information from larger sets of retrieved contexts, shifting some of the burden of identifying relevant documents from the retriever to the answer generator. Our approach uses curriculum learning, where models are trained across retrieval settings with varying levels of noise. Our experiments show that training samples with fewer distractor documents enable models to acquire citation and reasoning skills with greater sample efficiency and generalizability, demonstrating strong model performance even as the number of irrelevant passages increases. We benchmark our methods on three open-domain MHQA datasets and report significant gains in answer and citation accuracy. Furthermore, our experiments provide empirical insights into how simpler training samples can give models stronger signals for learning specific skills (e.g., citation generation) and how different components of post-training (e.g., training set construction, rule-based rewards, training sample ordering, etc.) impact final model performance.
Anthology ID:
2026.findings-eacl.294
Volume:
Findings of the Association for Computational Linguistics: EACL 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5548–5561
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.294/
DOI:
Bibkey:
Cite (ACL):
Jerry Huang, Siddarth Madala, Risham Sidhu, Cheng Niu, Hao Peng, Julia Hockenmaier, and Tong Zhang. 2026. Tackling Distractor Documents in Multi-Hop QA with Reinforcement and Curriculum Learning. In Findings of the Association for Computational Linguistics: EACL 2026, pages 5548–5561, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Tackling Distractor Documents in Multi-Hop QA with Reinforcement and Curriculum Learning (Huang et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.294.pdf
Checklist:
 2026.findings-eacl.294.checklist.pdf