Yajuan Tong

2026

TempTool-R1: Tool-Augmented Reinforcement Learning for Temporal Knowledge Graph Question Answering
Zicheng Huang | Yajuan Tong | Xinhui Tu | Tingting He
Findings of the Association for Computational Linguistics: ACL 2026

Temporal knowledge graph question answering (TKGQA) addresses time-sensitive queries over temporal knowledge graphs, but existing approaches struggle with multi-hop reasoning and implicit temporal constraints. We introduce TempTool-R1, a novel tool-integrated reasoning framework that enables large language models to explicitly use temporal tools for precise reasoning. First, we design a unified temporal tool-based API capable of transforming implicit temporal cues into executable operations, establishing the structural foundation for tool interaction. In the second stage, supervised fine-tuning teaches the model to interweave chain-of-thought reasoning with think-then-tool usage, allowing it to call temporal tools during inference. Finally, we apply reinforcement learning with fine-grained, order-sensitive reward functions tailored for temporal tool use, further refining the model’s tool-use policy. Experiments on three challenging TKGQA benchmarks demonstrate that TempTool-R1 significantly outperforms existing methods. In particular, our approach excels on complex questions requiring multi-hop temporal reasoning, highlighting the effectiveness of temporal tool integration and reward optimization in improving TKGQA performance.

2025

pdf bib abs

We present the system developed by the Central China Normal University (CCNU) team for the SemEval-2025 shared task 8, which focuses on Question-Answering (QA) for tabular data. Our approach leverages multiple Large Language Models (LLMs), conducting tabular QA as code completion. Additionally, to improve its reliability, we introduce a two-stage corrections mechanism, in which we instruct the LLM to correct the code according to the judges of whether the code is executable and whether the answer obtained from executing the code is semantically consistent with the question. The experiment demonstrates that code correction works but answer correction does not. Finally, we discuss other unsuccessful approaches explored during our development process.

Co-authors

Chengzhao Wu 1

Xin Xu 1

Chenlian Zhou 1

Venues

Fix author