Xuemin Liu


2026

Retrieval-Augmented Generation (RAG) systems have become a standard approach for grounding large language models in external knowledge. However, they are constrained by a decoupled architecture: retrieval and reasoning operate as separate stages, with retrieved text merely prepended as passive context. This prevents deep integration of knowledge into the model’s parametric reasoning, leading to fragmented responses for complex queries requiring multi-document synthesis or conflict resolution. To bridge this gap, we propose NeuRAG, an end-to-end Neuralized RAG framework that unifies knowledge retrieval and fusion through Hyper-Neurons—parameterized modules encoding entire documents directly into the model’s parameter space. In NeuRAG, each document is encoded as a lightweight LoRA module, conceptualized as a knowledge neuron. These neurons collectively form a document-adaptive Hyper-Layer, which dynamically activates and fuses knowledge neurons via an attention mechanism conditioned on the input hidden-state query. This enables the model to jointly retrieve and reason within a single forward pass, seamlessly integrating external knowledge into its inference pathway. Extensive experiments across multiple datasets and LLMs demonstrate NeuRAG’s strong and consistent performance as a promising novel RAG paradigm.

1988