Zhengren Wang


2025

pdf bib
QAEncoder: Towards Aligned Representation Learning in Question Answering Systems
Zhengren Wang | Qinhan Yu | Shida Wei | Zhiyu Li | Feiyu Xiong | Xiaoxing Wang | Simin Niu | Hao Liang | Wentao Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Modern QA systems entail retrieval-augmented generation (RAG) for accurate and trustworthy responses. However, the inherent gap between user queries and relevant documents hinders precise matching. We introduce QAEncoder, a training-free approach to bridge this gap. Specifically, QAEncoder estimates the expectation of potential queries in the embedding space as a robust surrogate for the document embedding, and attaches document fingerprints to effectively distinguish these embeddings. Extensive experiments across diverse datasets, languages, and embedding models confirmed QAEncoder’s alignment capability, which offers a simple-yet-effective solution with zero additional index storage, retrieval latency, training costs, or catastrophic forgetting and hallucination issues. The repository is publicly available at https://github.com/IAAR-Shanghai/QAEncoder.

pdf bib
HopRAG: Multi-Hop Reasoning for Logic-Aware Retrieval-Augmented Generation
Hao Liu | Zhengren Wang | Xi Chen | Zhiyu Li | Feiyu Xiong | Qinhan Yu | Wentao Zhang
Findings of the Association for Computational Linguistics: ACL 2025

Retrieval-Augmented Generation (RAG) systems often struggle with imperfect retrieval, as traditional retrievers focus on lexical or semantic similarity rather than logical relevance. To address this, we propose HopRAG, a novel RAG framework that augments retrieval with logical reasoning through graph-structured knowledge exploration. During indexing, HopRAG constructs a passage graph, with text chunks as vertices and logical connections established via LLM-generated pseudo-queries as edges. During retrieval, it employs a retrieve-reason-prune mechanism: starting with lexically or semantically similar passages, the system explores multi-hop neighbors guided by pseudo-queries and LLM reasoning to identify truly relevant ones. Experiments on multiple multi-hop benchmarks demonstrate that HopRAG’s retrieve-reason-prune mechanism can expand the retrieval scope based on logical connections and improve final answer quality.