Sungho Park


2025

pdf bib
HELIOS: Harmonizing Early Fusion, Late Fusion, and LLM Reasoning for Multi-Granular Table-Text Retrieval
Sungho Park | Joohyung Yun | Jongwuk Lee | Wook-Shin Han
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Table-text retrieval aims to retrieve relevant tables and text to support open-domain question answering. Existing studies use either early or late fusion, but face limitations. Early fusion pre-aligns a table row with its associated passages, forming “stars,” which often include irrelevant contexts and miss query-dependent relationships. Late fusion retrieves individual nodes, dynamically aligning them, but it risks missing relevant contexts. Both approaches also struggle with advanced reasoning tasks, such as column-wise aggregation and multi-hop reasoning. To address these issues, we propose HELIOS, which combines the strengths of both approaches. First, the edge-based bipartite subgraph retrieval identifies finer-grained edges between table segments and passages, effectively avoiding the inclusion of irrelevant contexts. Then, the query-relevant node expansion identifies the most promising nodes, dynamically retrieving relevant edges to grow the bipartite subgraph, minimizing the risk of missing important contexts. Lastly, the star-based LLM refinement performs logical inference at the star graph level rather than the bipartite subgraph, supporting advanced reasoning tasks. Experimental results show that HELIOS outperforms state-of-the-art models with a significant improvement up to 42.6% and 39.9% in recall and nDCG, respectively, on the OTT-QA benchmark.

pdf bib
SAFE: Schema-Driven Approximate Distance Join for Efficient Knowledge Graph Querying
Sangoh Lee | Sungho Park | Wook-Shin Han
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

To reduce hallucinations in large language models (LLMs), researchers are increasingly investigating reasoning methods that integrate LLMs with external knowledge graphs (KGs). Existing approaches either map an LLM-generated query graph onto the KG or let the LLM traverse the entire graph; the former is fragile because noisy query graphs derail retrieval, whereas the latter is inefficient due to entity-level reasoning over large graphs. In order to tackle these problems, we propose **SAFE** (**S**chema-Driven **A**pproximate Distance Join **F**or **E**fficient Knowledge Graph Querying), a framework that leverages schema graphs for robust query graph generation and efficient KG retrieval. SAFE introduces two key ideas: (1) an Approximate Distance Join (ADJ) algorithm that refines LLM-generated pseudo query graphs by flexibly aligning them with the KG’s structure; and (2) exploiting a compact schema graph to perform ADJ efficiently, reducing overhead and improving retrieval accuracy. Extensive experiments on WebQSP, CWQ and GrailQA demonstrate that SAFE outperforms state-of-the-art methods in both accuracy and efficiency, providing a robust and scalable solution to overcome the inherent limitations of LLM-based knowledge retrieval.