Xinle Deng
2026
Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency
Haoming Xu | Ningyuan Zhao | Yunzhi Yao | Weihong Xu | Hongru Wang | Xinle Deng | Shumin Deng | Jeff Z. Pan | Huajun Chen | Ningyu Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Haoming Xu | Ningyuan Zhao | Yunzhi Yao | Weihong Xu | Hongru Wang | Xinle Deng | Shumin Deng | Jeff Z. Pan | Huajun Chen | Ningyu Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
As Large Language Models (LLMs) are increasingly deployed in real-world settings, correctness alone is insufficient. Reliable deployment requires maintaining truthful beliefs under contextual perturbations. Existing evaluations largely rely on point-wise confidence like Self-Consistency, which can mask brittle belief. We show that even facts answered with perfect self-consistency can rapidly collapse under mild contextual interference. To address this gap, we propose Neighbor-Consistency Belief (NCB), a structural measure of belief robustness that evaluates response coherence across a conceptual neighborhood. To validate the efficiency of NCB, we introduce a new cognitive stress-testing protocol that probes outputs stability under contextual interference. Experiments across multiple LLMs show that the performance of high-NCB data is relatively more resistant to interference. Finally, we present Structure-Aware Training (SAT), which optimizes context-invariant belief structure and reduces long-tail knowledge brittleness by approximately 30%.
Temp-R1: A Unified Autonomous Agent for Complex Temporal KGQA via Reverse Curriculum Reinforcement Learning
Zhaoyan Gong | Zhiqiang Liu | Songze Li | Xiaoke Guo | Yuanxiang Liu | Xinle Deng | Zhizhen Liu | Lei Liang | Huajun Chen | Wen Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhaoyan Gong | Zhiqiang Liu | Songze Li | Xiaoke Guo | Yuanxiang Liu | Xinle Deng | Zhizhen Liu | Lei Liang | Huajun Chen | Wen Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Temporal Knowledge Graph Question Answering (TKGQA) is inherently challenging, as it requires sophisticated reasoning over dynamic facts with multi-hop dependencies and complex temporal constraints. Existing methods rely on fixed workflows and expensive closed-source APIs, limiting flexibility and scalability. We propose **Temp-R1**, the first autonomous end-to-end agent for TKGQA trained through reinforcement learning. To address cognitive overload in single-action reasoning, we expand the action space with specialized internal actions alongside external action. To prevent shortcut learning on simple questions, we introduce reverse curriculum learning that trains on difficult questions first, forcing the development of sophisticated reasoning before transferring to easier cases. Our 8B-parameter Temp-R1 achieves state-of-the-art performance on MultiTQ and TimelineKGQA, improving 19.8% over strong baselines on complex questions. Our work establishes a new paradigm for autonomous temporal reasoning agents. The code is available at https://github.com/zjukg/Temp-R1.
2025
EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models
Ziwen Xu | Shuxun Wang | Kewei Xu | Haoming Xu | Mengru Wang | Xinle Deng | Yunzhi Yao | Guozhou Zheng | Huajun Chen | Ningyu Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Ziwen Xu | Shuxun Wang | Kewei Xu | Haoming Xu | Mengru Wang | Xinle Deng | Yunzhi Yao | Guozhou Zheng | Huajun Chen | Ningyu Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
In this paper, we introduce EasyEdit2, a framework designed to enable plug-and-play adjustability for controlling Large Language Model (LLM) behaviors. EasyEdit2 supports a wide range of test-time interventions, including safety, sentiment, personality, reasoning patterns, factuality, and language features. Unlike its predecessor, EasyEdit2 features a new architecture specifically designed for seamless model steering. It comprises key modules such as the steering vector generator and the steering vector applier, which enable automatic generation and application of steering vectors to influence the model’s behavior without modifying its parameters. One of the main advantages of EasyEdit2 is its ease of use—users do not need extensive technical knowledge. With just a single example, they can effectively guide and adjust the model’s responses, making precise control both accessible and efficient. Empirically, we report model steering performance across different LLMs, demonstrating the effectiveness of these techniques. We have released the source code on https://github.com/zjunlp/EasyEdit along with a demonstration notebook. In addition, we provide an online system at http://easyedit.zjukg.cn/for real-time model steering, and a demo video at https://www.youtube.com/watch?v=AkfoiPfp5rQ.