NeedleChain: Measuring Intact Context Comprehension Capability of Large Language Models

Hyeonseok Moon, Heuiseok Lim


Abstract
Recent reports suggest that LLMs can handle increasingly long contexts. However, many existing benchmarks for context understanding embed substantial query-irrelevant content, which shifts evaluation toward retrieving relevant snippets rather than fully integrating all provided information. Under this setting, we view that current benchmarks can overestimate true context-understanding ability of LLMs. In particular, we demonstrate that when the context consists entirely of query-relevant text, even advanced models such as GPT-4o fail to reliably integrate inputs as short as 200 tokens. To evaluate this capability more rigorously, we introduce NeedleChain, a benchmark designed to test whether models can faithfully incorporate all given evidence. NeedleChain includes three variants that differ in the required order of comprehension, along with a parallel benchmark based on the needle-in-a-haystack(NIAH) paradigm. By comparing these variants, NeedleChain enables a more comprehensive assessment of context understanding. We further propose a training-free strategy that encourages models to reflect all available information, ROPE contraction, highlighting the importance of full-context integration and pointing to new directions for improving reliable reasoning over context.
Anthology ID:
2026.findings-acl.1637
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
32718–32730
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1637/
DOI:
Bibkey:
Cite (ACL):
Hyeonseok Moon and Heuiseok Lim. 2026. NeedleChain: Measuring Intact Context Comprehension Capability of Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 32718–32730, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
NeedleChain: Measuring Intact Context Comprehension Capability of Large Language Models (Moon & Lim, Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1637.pdf
Checklist:
 2026.findings-acl.1637.checklist.pdf