Shijie Zhang
2026
Thought-Action Graph Reasoning: Faithful and Efficient Reasoning of Large Language Models via Reusing Past Experience
Zhixiao Qi | Feng Huang | Yunqi Zhang | Shijie Zhang | Qingqing Sun | Yongfeng Huang | Minghu Jiang | Shuai Chen | Tianyi Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Zhixiao Qi | Feng Huang | Yunqi Zhang | Shijie Zhang | Qingqing Sun | Yongfeng Huang | Minghu Jiang | Shuai Chen | Tianyi Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Large language models (LLMs) often hallucinate in question answering (QA) tasks due to a lack of factual knowledge. While integrating knowledge graphs (KGs) with LLMs has alleviated this issue, existing methods suffer from poor generalization or low reasoning efficiency, and critically, they overlook the learning and reuse of reasoning paths from past experiences. To address these challenges, we introduce Thought-Action Graph (TAG), a structured repository of reasoning experiences. TAG decomposes successful LLM-KG interaction trajectories into fine-grained semantic operators, which are stored in TAG constructed by the thought layer and action layer. Building upon TAG, we propose a novel KGQA paradigm — TAG-Reasoning (TAGR). TAGR first retrieves and assembles reasoning blueprints from TAG, and then guides LLM to efficiently execute on KG according to them. Through this approach, TAGR transforms the computationally expensive online exploration process of LLMs into an offline process of TAG retrieval and assembly. Experimental results on multiple KGQA benchmarks demonstrate that TAGR significantly outperforms state-of-the-art methods across key metrics, while drastically reducing the number of LLM calls and generated tokens. This work opens new avenues for building continual learning, efficient, and faithful KGQA systems.
2025
HiMATE: A Hierarchical Multi-Agent Framework for Machine Translation Evaluation
Shijie Zhang | Renhao Li | Songsheng Wang | Philipp Koehn | Min Yang | Derek F. Wong
Findings of the Association for Computational Linguistics: EMNLP 2025
Shijie Zhang | Renhao Li | Songsheng Wang | Philipp Koehn | Min Yang | Derek F. Wong
Findings of the Association for Computational Linguistics: EMNLP 2025
The advancement of Large Language Models (LLMs) enables flexible and interpretable automatic evaluations. In the field of machine translation evaluation, utilizing LLMs with translation error annotations based on Multidimensional Quality Metrics (MQM) yields more human-aligned judgments. However, current LLM-based evaluation methods still face challenges in accurately identifying error spans and assessing their severity. In this paper, we propose HiMATE, a Hierarchical Multi-Agent Framework for Machine Translation Evaluation. We argue that existing approaches inadequately exploit the fine-grained structural and semantic information within the MQM hierarchy. To address this, we develop a hierarchical multi-agent system grounded in the MQM error typology, enabling granular evaluation of subtype errors. Two key strategies are incorporated to further mitigate systemic hallucinations within the framework: the utilization of the model’s self-reflective capability and the facilitation of agent discussion involving asymmetric information. Empirically, HiMATE outperforms competitive baselines across different datasets in conducting human-aligned evaluations. Further analyses underscore its significant advantage in error span detection and severity assessment, achieving an average F1-score improvement of 89% over the best-performing baseline. We make our code and data publicly available at https://github.com/nlp2ct-shijie/HiMATE.