Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models
Yifu Qiu, Varun R. Embar, Yizhe Zhang, Navdeep Jaitly, Shay B Cohen, Benjamin Han
Abstract
Recent advancements in long-context language models (LCLMs) promise to transform Retrieval-Augmented Generation (RAG) by simplifying pipelines. With their expanded context windows, LCLMs can process entire knowledge bases and perform retrieval and reasoning directly – a capability we define as In-Context Retrieval and Reasoning (ICR^2). However, existing benchmarks like LOFT often overestimate LCLM performance by providing overly simplified contexts. To address this, we introduce ICR^2, a benchmark that evaluates LCLMs in more realistic scenarios by including confounding passages retrieved with strong retrievers. We then propose three methods to enhance LCLM performance: (1) retrieve-then-generate fine-tuning, (2) retrieval-attention-probing, which uses attention heads to filter and de-noise long contexts during decoding, and (3) joint retrieval head training alongside the generation head. Our evaluation of five well-known LCLMs on LOFT and ICR^2 demonstrates significant gains with our best approach applied to Mistral-7B: +17 and +15 points by Exact Match on LOFT, and +13 and +2 points on ICR^2, compared to vanilla RAG and supervised fine-tuning, respectively. It even outperforms GPT-4-Turbo on most tasks despite being a much smaller model.- Anthology ID:
- 2025.findings-acl.165
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2025
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3176–3192
- Language:
- URL:
- https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.165/
- DOI:
- 10.18653/v1/2025.findings-acl.165
- Cite (ACL):
- Yifu Qiu, Varun R. Embar, Yizhe Zhang, Navdeep Jaitly, Shay B Cohen, and Benjamin Han. 2025. Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2025, pages 3176–3192, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models (Qiu et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.165.pdf