Junnan Dong

2026

Retrieval-Augmented Generation (RAG) has demonstrated significant potential in enhancing large language models (LLMs) by supplementing external knowledge. However, existing approaches focus primarily on retrieving isolated factual knowledge entities while neglecting the critical reasoning relationships. To address this limitation, Graph-Augmented Generation (GraphRAG) has emerged as an effective solution, which explicitly integrates structured knowledge graphs to support complex reasoning tasks. Although diverse graph construction methods have been explored, they typically rely on static, query-agnostic graphs constructed via fixed heuristics. We are thereby motivated to propose a query-centric retrieval framework that adaptively constructs a graph tailored to each query. However, it is challenging to accurately identify these latent relationships from queries to the corpus. Moreover, unifying multiple local-perspective connections into a globally coherent structured corpus introduces additional complexity. To this end, we introduce HyperRAG, a novel framework in the Hyperbolic space that captures both explicit entity-based links and implicit query-aware connections. Extensive experiments on three benchmark datasets demonstrate that HyperRAG consistently outperforms existing baselines.

pdf bib abs

Retrieval-Augmented Generation (RAG) has long been a promising paradigm for enhancing large language models (LLMs) with external knowledge. Traditional embedding-based methods for graph construction can capture semantic similarity but struggle to establish fine-grained, interpretable logical relationships. Recently, Graph-enhanced RAG (GraphRAG) has gained increasing popularity for its capability in modeling logical relationships. However, graph construction requires extensive token consumption for triple extraction and summarization, making it costly and slow. Accordingly, we propose MeshRAG, a novel framework for mining efficient structures via hashing to enhance RAG. We adopt an inductive paradigm in which global graph structure emerges from local hash collisions rather than explicit symbolic extraction. By replacing neural embedding search with lightweight and bitwise operations, MeshRAG automates a simple and rapid graph construction process. Furthermore, the hash collision mechanism provides transparent evidence for logical connections and retrieval decisions. Experimental results show that MeshRAG outperforms existing baselines, while its graph construction requires no GPU resources or token budget and can structure over ten thousand chunks in a few minutes.

2024

pdf bib abs

Extreme multi-label text classification (EMTC) involves predicting multiple labels from a vast pool of candidates based on a user’s textual query. While traditional BERT-based methods have shown limited success, large language models (LLMs) have brought new possibilities. It is promising to leverage their remarkable comprehension ability to understand textual queries. However, implementing LLMs is non-trivial for two main reasons. Firstly, real-world EMTC datasets can be extremely large, with candidate product pairs reaching up to ten million in real-world scenarios, which poses significant challenges in data ingestion. Secondly, the large size of LLMs makes computation and memory demands prohibitive for EMTC applications. To this end, we propose QUEST, a Quantized and Efficient Learning with Sampling Technique. QUEST includes a tailored hash sampling module that reduces the data volume to one-fourth of its original size. Additionally, we perform compressive fine-tuning LLMs with only twenty thousand trainable parameters, largely reducing computational requirements. Extensive experiments demonstrate that QUEST outperforms existing methods while requiring fewer computational resources, unlocking efficient EMTC on commodity hardware such as a single Nvidia RTX 3090 GPU with 24 GB of memory.

pdf bib abs

Modality-Aware Integration with Large Language Models for Knowledge-Based Visual Question Answering
Junnan Dong | Qinggang Zhang | Huachi Zhou | Daochen Zha | Pai Zheng | Xiao Huang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Knowledge-based visual question answering (KVQA) has been extensively studied to answer visual questions with external knowledge, e.g., knowledge graphs (KGs). While several attempts have been proposed to leverage large language models (LLMs) as an implicit knowledge source, it remains challenging since LLMs may generate hallucinations. Moreover, multiple knowledge sources, e.g., images, KGs and LLMs, cannot be readily aligned for complex scenarios. To tackle these, we present a novel modality-aware integration with LLMs for KVQA (MAIL). It carefully leverages multimodal knowledge for both image understanding and knowledge reasoning. Specifically, (i) we propose a two-stage prompting strategy with LLMs to densely embody the image into a *scene graph* with detailed visual features; (ii) We construct a coupled *concept graph* by linking the mentioned entities with external facts. (iii) A tailored pseudo-siamese graph medium fusion is designed for sufficient multimodal fusion. We utilize the shared mentioned entities in two graphs as mediums to bridge a tight inter-modal exchange, while maximally preserving insightful intra-modal learning by constraining the fusion within mediums. Extensive experiments show the superiority of MAIL.

Co-authors

di Yin 2

Siyu An 1

Su Dong 1

Venues

ACL3
Findings1

Fix author