Rohith Namboothiri

2026

Ghost Context: Measuring Cross-Context Interference in Long-Context Language Models
Rohith Namboothiri
Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026)

Long-context language models assemble prompts from heterogeneous sources, and deployed systems implicitly trust the model to use the correct span of context. We show that this assumption is often violated: irrelevant spans can silently shape outputs, producing errors that are neither fabrication nor omission but misattributed grounding—claims supported by the wrong part of the input context. Unlike intrinsic hallucination (contradicting the source) or extrinsic hallucination (introducing unsupported claims), misattributed grounding uses real evidence from an incorrect span, making it invisible to standard source-blind faithfulness metrics.We formalize this phenomenon as Ghost Context and introduce a causal mask-and-rerun attribution protocol to measure it. Across a 272-case corpus spanning multiple interference scenarios, we evaluate three widely used models and report two complementary signals: strict Ghost Context Rate (GCR), which captures verifiable factual misattribution, and open-ended influence, which captures broader contextual shaping effects. Under realistic contextual conflict, strict GCR spikes substantially: temporal contradictions trigger misattributed grounding in 38.3% of cases. Across all scenarios, open-ended distractor influence occurs in 20.4% of evaluations.Importantly, Ghost Context is not only detectable but also remediable. Masking the single highest-attributed distractor span resolves 95.5% of detected errors (Fix@1) with 2.4% collateral damage and zero false positives on negative controls. We also introduce Contextual Invariance Rate (CIR) as a system-level robustness metric measuring invariance to irrelevant context.Our findings show that contextual conflict—common in retrieval-augmented generation and agent systems—can systematically degrade reliability, but also reveal that Ghost Context errors are causally localizable and cheaply correctable. We release the evaluation corpus, detection pipeline, and experimental results to support further research on trustworthy long-context language model evaluation.

pdf bib abs

Authorization-First Retrieval: Enforcing Least Privilege in Multi-Agent RAG Systems
Rohith Namboothiri
Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026)

Retrieval-augmented generation systems serving multiple users under role-based access control face a trustworthiness gap: semantic retrieval operates on embedding similarity rather than authorization predicates and can introduce unauthorized content into a model’s context window before any filter intervenes. We formalize this as a pipeline ordering problem and introduce Authorization-First Retrieval (AFR), an architectural invariant requiring that authorization constrain the retrieval candidate set before any learned component consumes retrieved content. We reduce authorization correctness to the classical noninterference property and prove AFR is necessary whenever the processing model violates noninterference—a condition our experiments confirm empirically. Evaluation on a controlled corpus of 247 chunks across 232 documents with 431 base queries spanning 12 enterprise roles and 9 domains (584 total queries including negation exploitation and parametric probes) shows that retrieve-then-filter pipelines expose unauthorized context in 86.1% of queries, while AFR eliminates structural leaks by construction. Cross-model experiments with Gemini 2.0 Flash and GPT-4o-mini reveal that structural exposure is an architectural property independent of the underlying model, whereas behavioral defenses fail at model-dependent rates, producing answer leakage of 41.3% and 29.5% respectively under retrieve-then-filter. A negation exploitation study demonstrates consistent disclosure vulnerabilities across framing types, while a metadata-tag freshness ablation shows that conditional authorization mechanisms degrade under realistic policy staleness. Stress tests across retrieval depths and chunking granularities confirm AFR’s robustness. Our results demonstrate that behavioral guardrails and metadata tagging cannot reliably enforce least privilege in RAG pipelines, while authorization-first architectures provide a verifiable and model-independent security guarantee.

Co-authors

Venues

TrustNLP2
WS2

Fix author