@inproceedings{namboothiri-2026-ghost,
title = "Ghost Context: Measuring Cross-Context Interference in Long-Context Language Models",
author = "Namboothiri, Rohith",
editor = "Chang, Kai-Wei and
Mehrabi, Ninareh and
Krishna, Satyapriya and
Das, Anubrata and
Dhamala, Jwala and
Cao, Yang Trista and
Kumarage, Tharindu and
Ramakrishna, Anil and
Christodoulopoulos, Christos and
Wan, Yixin and
Galystan, Aram and
Kumar, Anoop and
Gupta, Rahul",
booktitle = "Proceedings of the 6th Workshop on Trustworthy {NLP} ({T}rust{NLP} 2026)",
month = jul,
year = "2026",
address = "San Diego, California",
publisher = "Association for Computational Linguistics",
url = "https://preview.aclanthology.org/ingest-acl-workshops/2026.trustnlp-main.19/",
pages = "316--329",
ISBN = "979-8-89176-418-7",
abstract = "Long-context language models assemble prompts from heterogeneous sources, and deployed systems implicitly trust the model to use the correct span of context. We show that this assumption is often violated: irrelevant spans can silently shape outputs, producing errors that are neither fabrication nor omission but misattributed grounding{---}claims supported by the wrong part of the input context. Unlike intrinsic hallucination (contradicting the source) or extrinsic hallucination (introducing unsupported claims), misattributed grounding uses real evidence from an incorrect span, making it invisible to standard source-blind faithfulness metrics.We formalize this phenomenon as Ghost Context and introduce a causal mask-and-rerun attribution protocol to measure it. Across a 272-case corpus spanning multiple interference scenarios, we evaluate three widely used models and report two complementary signals: strict Ghost Context Rate (GCR), which captures verifiable factual misattribution, and open-ended influence, which captures broader contextual shaping effects. Under realistic contextual conflict, strict GCR spikes substantially: temporal contradictions trigger misattributed grounding in 38.3{\%} of cases. Across all scenarios, open-ended distractor influence occurs in 20.4{\%} of evaluations.Importantly, Ghost Context is not only detectable but also remediable. Masking the single highest-attributed distractor span resolves 95.5{\%} of detected errors (Fix@1) with 2.4{\%} collateral damage and zero false positives on negative controls. We also introduce Contextual Invariance Rate (CIR) as a system-level robustness metric measuring invariance to irrelevant context.Our findings show that contextual conflict{---}common in retrieval-augmented generation and agent systems{---}can systematically degrade reliability, but also reveal that Ghost Context errors are causally localizable and cheaply correctable. We release the evaluation corpus, detection pipeline, and experimental results to support further research on trustworthy long-context language model evaluation."
}Markdown (Informal)
[Ghost Context: Measuring Cross-Context Interference in Long-Context Language Models](https://preview.aclanthology.org/ingest-acl-workshops/2026.trustnlp-main.19/) (Namboothiri, TrustNLP 2026)
ACL