Long-context Language Models Fail in Basic Retrieval Tasks Without Sufficient Reasoning Steps
Yijiong Yu, Zhixiao Qi, Yongfeng Huang, Wei Wang, Weifeng.liu, Ran Chen, Ji Pei
Abstract
Long-context language models (LCLMs), characterized by their extensive context window, are becoming popular. However, despite the fact that they are nearly perfect at standard long-context retrieval tasks, our evaluations demonstrate they fail in some basic cases. Later, we find they can be well addressed with a sufficient number of reasoning steps, guided by specific CoT prompts. This result emphasizes the potential necessity of solving specific long-context tasks using long-CoT methods, while previous long-context benchmarks always ignore the necessity of long reasoning for long-context tasks and treat them as direct QA tasks. Our code and datasets are available at https://github.com/yuyijiong/hard_retrieval_for_llm- Anthology ID:
- 2025.findings-emnlp.301
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2025
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5615–5634
- Language:
- URL:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.301/
- DOI:
- 10.18653/v1/2025.findings-emnlp.301
- Cite (ACL):
- Yijiong Yu, Zhixiao Qi, Yongfeng Huang, Wei Wang, Weifeng.liu, Ran Chen, and Ji Pei. 2025. Long-context Language Models Fail in Basic Retrieval Tasks Without Sufficient Reasoning Steps. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 5615–5634, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- Long-context Language Models Fail in Basic Retrieval Tasks Without Sufficient Reasoning Steps (Yu et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.301.pdf