Zicheng Huang


2026

Temporal knowledge graph question answering (TKGQA) addresses time-sensitive queries over temporal knowledge graphs, but existing approaches struggle with multi-hop reasoning and implicit temporal constraints. We introduce TempTool-R1, a novel tool-integrated reasoning framework that enables large language models to explicitly use temporal tools for precise reasoning. First, we design a unified temporal tool-based API capable of transforming implicit temporal cues into executable operations, establishing the structural foundation for tool interaction. In the second stage, supervised fine-tuning teaches the model to interweave chain-of-thought reasoning with think-then-tool usage, allowing it to call temporal tools during inference. Finally, we apply reinforcement learning with fine-grained, order-sensitive reward functions tailored for temporal tool use, further refining the model’s tool-use policy. Experiments on three challenging TKGQA benchmarks demonstrate that TempTool-R1 significantly outperforms existing methods. In particular, our approach excels on complex questions requiring multi-hop temporal reasoning, highlighting the effectiveness of temporal tool integration and reward optimization in improving TKGQA performance.

2024

Recent advancements in Large Language Models (LLMs) have been reshaping Natural Language Processing (NLP) task in several domains. Their use in the field of Human Resources (HR) has still room for expansions and could be beneficial for several time consuming tasks. Examples such as time-off submissions, medical claims filing, and access requests are noteworthy, but they are by no means the sole instances. However the aforementioned developments must grapple with the pivotal challenge of constructing a high-quality training dataset. On one hand, most conversation datasets are solving problems for customers not employees. On the other hand, gathering conversations with HR could raise privacy concerns. To solve it, we introduce HR-Multiwoz, a fully-labeled dataset of 550 conversations spanning 10 HR domains. Our work has the following contributions:(1) It is the first labeled open-sourced conversation dataset in the HR domain for NLP research. (2) It provides a detailed recipe for the data generation procedure along with data analysis and human evaluations. The data generation pipeline is transferrable and can be easily adapted for labeled conversation data generation in other domains. (3) The proposed data-collection pipeline is mostly based on LLMs with minimal human involvement for annotation, which is time and cost-efficient.