Gun Il Kim

2026

PROGRAM: Programmatic Retrieval Optimization with Generative Reasoning and Augmented Multi-queries
Gun Il Kim | Jungkyu Shin | Jong Wook Kim | Beakcheol Jang
Findings of the Association for Computational Linguistics: ACL 2026

Current retrieval-augmented generation (RAG) methods struggle with complex multi-hop reasoning, relying on unstructured semantic matching that lacks the logical structure needed to systematically guide retrieval. We introduce Programmatic Retrieval Optimization with Generative Reasoning and Augmented Multi-queries (PROGRAM), a novel framework that elevates retrieval to structured, program-guided reasoning. PROGRAM treats retrieval as execution of specific program types, such as logical, temporal, causal, and so forth, through three stages of ’Program-Type Selection’ with dual-metric optimization, ’Iterative Active Program Pruning’ with evidence accumulation, and ’Final Answer Generation’ with reranking. Evaluated on five benchmarks including HotPotQA, 2WikiMultihopQA, ARC-Challenge, MMLU-Pro, and MedQA with various LLMs, PROGRAM achieves state-of-the-art performance with up to 24% relative improvement on HotPotQA and 13.2% on MedQA over strong baselines including FLARE, ProbTree and Self-RAG.

2025

pdf bib abs

UniRAG: A Unified RAG Framework for Knowledge-Intensive Queries with Decomposition, Break-Down Reasoning, and Iterative Rewriting
Gun Il Kim | Jong Wook Kim | Beakcheol Jang
Findings of the Association for Computational Linguistics: EMNLP 2025

Knowledge-intensive queries require accurate answers that are explicitly grounded in retrieved evidence. However, existing retrieval-augmented generation (RAG) approaches often struggle with query complexity, suffer from propagated reasoning errors, or rely on incomplete or noisy retrieval, limiting their effectiveness. To address these limitations, we introduce UniRAG, a unified RAG framework that integrates entity-grounded query decomposition, break-down reasoning, and iterative query rewriting. Specifically, UniRAG decomposes queries into semantically coherent sub-queries, explicitly verifies retrieved sub-facts through a dedicated reasoning module, and adaptively refines queries based on identified knowledge gaps, significantly improving answer completeness and reliability. Extensive benchmark evaluations on complex question-answering datasets, including multi-hop HotPotQA and 2WikiMultihopQA, biomedical MedMCQA and MedQA, and fact-verification FEVER and SciFact, demonstrate that UniRAG consistently achieves performance improvements across various state-of-the-art LLMs, such as LLaMA-3.1-8B, GPT-3.5-Turbo, and Gemini-1.5-Flash.

Co-authors

Venues

Findings2

Fix author