Haonan Bian
2026
From Query to Logic: Ontology-Driven Multi-Hop Reasoning in LLMs
Haonan Bian | Yutao Qi | Rui Yang | Yuanxi Che | Jiaqian Wang | Heming Xia | Ranran Zhen
Findings of the Association for Computational Linguistics: ACL 2026
Haonan Bian | Yutao Qi | Rui Yang | Yuanxi Che | Jiaqian Wang | Heming Xia | Ranran Zhen
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs), despite their success in question answering, exhibit limitations in complex multi-hop question answering (MQA) tasks that necessitate non-linear, structured reasoning. This limitation stems from their inability to adequately capture deep conceptual relationships between entities. To overcome this challenge, we present ORACLE (Ontology-driven Reasoning And Chain for Logical Elucidation), a training-free framework that combines LLMs’ generative capabilities with the structural benefits of knowledge graphs. Our approach operates through three stages: (1) dynamic construction of question-specific knowledge ontologies using LLMs, (2) transformation of these ontologies into First-Order Logic (FOL) reasoning chains, and (3) systematic decomposition of the original query into logically coherent sub-questions. Extensive experiments across a diverse set of models and standard MQA benchmarks demonstrate that our framework achieves competitive performance while producing more interpretable reasoning chains.
RealMem: Benchmarking LLMs in Real-World Memory-Driven Interaction
Haonan Bian | Zhiyuan Yao | Sen Hu | Zishan Xu | Shaolei Zhang | Yifu Guo | Ziliang Yang | Xueran Han | Huacan Wang | Ronghao Chen
Findings of the Association for Computational Linguistics: ACL 2026
Haonan Bian | Zhiyuan Yao | Sen Hu | Zishan Xu | Shaolei Zhang | Yifu Guo | Ziliang Yang | Xueran Han | Huacan Wang | Ronghao Chen
Findings of the Association for Computational Linguistics: ACL 2026
As Large Language Models (LLMs) evolve from static dialogue interfaces to autonomous general agents, effective memory is paramount to ensuring long-term consistency. However, existing benchmarks primarily focus on casual conversation or task-oriented dialogue, failing to capture “long-term project-oriented” interactions where agents must track evolving goals. To bridge this gap, we introduce RealMem, the first benchmark grounded in realistic project scenarios. RealMem comprises over 2,000 cross-session dialogues across eleven scenarios, utilizing natural user queries for evaluation. We propose a synthesis pipeline that integrates Project Foundation Construction, Multi-Agent Dialogue Generation, and Memory and Schedule Management to simulate the dynamic evolution of memory. Experiments reveal that current memory systems face significant challenges in managing the long-term project states and dynamic context dependencies inherent in real-world projects. Our code and datasets are available at https://anonymous.4open.science/r/realmem-A1E4.