Beakcheol Jang
2026
PROGRAM: Programmatic Retrieval Optimization with Generative Reasoning and Augmented Multi-queries
Gun Il Kim | Jungkyu Shin | Jong Wook Kim | Beakcheol Jang
Findings of the Association for Computational Linguistics: ACL 2026
Gun Il Kim | Jungkyu Shin | Jong Wook Kim | Beakcheol Jang
Findings of the Association for Computational Linguistics: ACL 2026
Current retrieval-augmented generation (RAG) methods struggle with complex multi-hop reasoning, relying on unstructured semantic matching that lacks the logical structure needed to systematically guide retrieval. We introduce Programmatic Retrieval Optimization with Generative Reasoning and Augmented Multi-queries (PROGRAM), a novel framework that elevates retrieval to structured, program-guided reasoning. PROGRAM treats retrieval as execution of specific program types, such as logical, temporal, causal, and so forth, through three stages of ’Program-Type Selection’ with dual-metric optimization, ’Iterative Active Program Pruning’ with evidence accumulation, and ’Final Answer Generation’ with reranking. Evaluated on five benchmarks including HotPotQA, 2WikiMultihopQA, ARC-Challenge, MMLU-Pro, and MedQA with various LLMs, PROGRAM achieves state-of-the-art performance with up to 24% relative improvement on HotPotQA and 13.2% on MedQA over strong baselines including FLARE, ProbTree and Self-RAG.
CTRL: Control-Based Time Series Forecasting with LLM-Guided Residual Learning
Minkyoung Kim | Daeun Ji | Yohan Lee | Beomsoo Kim | Beakcheol Jang
Findings of the Association for Computational Linguistics: ACL 2026
Minkyoung Kim | Daeun Ji | Yohan Lee | Beomsoo Kim | Beakcheol Jang
Findings of the Association for Computational Linguistics: ACL 2026
Time series forecasting underpins critical decision-making across diverse domains. While large language models (LLMs) offer promising reasoning capabilities, existing LLM-based time series forecasting approaches either reduce them to numerical predictors that bypass their strengths, or allow direct forecast generation that destabilizes predictions in non-stationary settings. We introduce CTRL, a framework that decouples semantic reasoning from quantitative prediction. A frozen backbone generates base forecasts, while specialized LLM agents function as controllers that analyze backbone prediction errors through decomposed trend, seasonal, and irregular components, grounding reasoning in interpretable temporal structure. Each agent outputs compact control signals that a lightweight residual decoder translates into forecast corrections. CTRL incorporates label-free test-time adaptation that detects distribution shift from input statistics alone and readapts control signals with only 3–24 LLM calls via caching. CTRL is explicitly designed to improve robustness under non-stationary temporal dynamics and distribution shift, while remaining competitive on highly stationary time series where adaptive correction provides limited additional benefit.
2025
UniRAG: A Unified RAG Framework for Knowledge-Intensive Queries with Decomposition, Break-Down Reasoning, and Iterative Rewriting
Gun Il Kim | Jong Wook Kim | Beakcheol Jang
Findings of the Association for Computational Linguistics: EMNLP 2025
Gun Il Kim | Jong Wook Kim | Beakcheol Jang
Findings of the Association for Computational Linguistics: EMNLP 2025
Knowledge-intensive queries require accurate answers that are explicitly grounded in retrieved evidence. However, existing retrieval-augmented generation (RAG) approaches often struggle with query complexity, suffer from propagated reasoning errors, or rely on incomplete or noisy retrieval, limiting their effectiveness. To address these limitations, we introduce UniRAG, a unified RAG framework that integrates entity-grounded query decomposition, break-down reasoning, and iterative query rewriting. Specifically, UniRAG decomposes queries into semantically coherent sub-queries, explicitly verifies retrieved sub-facts through a dedicated reasoning module, and adaptively refines queries based on identified knowledge gaps, significantly improving answer completeness and reliability. Extensive benchmark evaluations on complex question-answering datasets, including multi-hop HotPotQA and 2WikiMultihopQA, biomedical MedMCQA and MedQA, and fact-verification FEVER and SciFact, demonstrate that UniRAG consistently achieves performance improvements across various state-of-the-art LLMs, such as LLaMA-3.1-8B, GPT-3.5-Turbo, and Gemini-1.5-Flash.