Wanting Su

2026

Retrieval-augmented generation reduces hallucination by grounding model outputs in external evidence, yet hallucinations can still occur even when the retrieved context is accurate and sufficient. From the perspective of information routing in the residual stream, this reflects an imbalance where internal parametric knowledge overwhelms external context during generation. We present an attention-centric analysis of RAG hallucination under valid evidence, showing that hallucinated and factual tokens diverge in mid-to-late Transformer layers as context-selective attention routing weakens, allowing parametric influence to dominate the residual stream. Motivated by prior studies showing that some attention heads—often referred to as copying heads—exhibit stronger information transport capacity, we aim to extend similar evidence-carrying behavior to a broader set of attention heads. To this end, we introduce CoDA, a lightweight inference-time attention intervention that amplifies evidence-aligned value states, enabling more attention heads to transport reliable external evidence in a copy-encouraged manner. Experiments demonstrate that CoDA improves contextual faithfulness, reduces hallucination, and remains robust under long and noisy contexts with modest and stable inference overhead.

Co-authors

Jianhua Zhao 1

Tao Zheng 1

Venues

Findings1

Fix author