Haoming Xu
2026
How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities
Ziwen Xu | Kewei Xu | Haoming Xu | Haiwen Hong | Longtao Huang | Hui Xue | Ningyu Zhang | Yongliang Shen | Guozhou Zheng | Huajun Chen | Shumin Deng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ziwen Xu | Kewei Xu | Haoming Xu | Haiwen Hong | Longtao Huang | Hui Xue | Ningyu Zhang | Yongliang Shen | Guozhou Zheng | Huajun Chen | Shumin Deng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) are increasingly deployed in socially sensitive domains, yet their unpredictable behaviors, ranging from misaligned intent to inconsistent personality, pose significant risks. We introduce SteerEval, a hierarchical benchmark for evaluating LLM controllability across three domains: language features, sentiment, and personality. Each domain is structured into three specification levels: L1 (what to express), L2 (how to express), and L3 (how to instantiate), connecting high-level behavioral intent to concrete textual output. Using SteerBench, we systematically evaluate contemporary steering methods, revealing that control often degrades at finer-grained levels. Our benchmark offers a principled and interpretable framework for safe and controllable LLM behavior, serving as a foundation for future research.
Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency
Haoming Xu | Ningyuan Zhao | Yunzhi Yao | Weihong Xu | Hongru Wang | Xinle Deng | Shumin Deng | Jeff Z. Pan | Huajun Chen | Ningyu Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Haoming Xu | Ningyuan Zhao | Yunzhi Yao | Weihong Xu | Hongru Wang | Xinle Deng | Shumin Deng | Jeff Z. Pan | Huajun Chen | Ningyu Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
As Large Language Models (LLMs) are increasingly deployed in real-world settings, correctness alone is insufficient. Reliable deployment requires maintaining truthful beliefs under contextual perturbations. Existing evaluations largely rely on point-wise confidence like Self-Consistency, which can mask brittle belief. We show that even facts answered with perfect self-consistency can rapidly collapse under mild contextual interference. To address this gap, we propose Neighbor-Consistency Belief (NCB), a structural measure of belief robustness that evaluates response coherence across a conceptual neighborhood. To validate the efficiency of NCB, we introduce a new cognitive stress-testing protocol that probes outputs stability under contextual interference. Experiments across multiple LLMs show that the performance of high-NCB data is relatively more resistant to interference. Finally, we present Structure-Aware Training (SAT), which optimizes context-invariant belief structure and reduces long-tail knowledge brittleness by approximately 30%.
2025
ReLearn: Unlearning via Learning for Large Language Models
Haoming Xu | Ningyuan Zhao | Liming Yang | Sendong Zhao | Shumin Deng | Mengru Wang | Bryan Hooi | Nay Oo | Huajun Chen | Ningyu Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Haoming Xu | Ningyuan Zhao | Liming Yang | Sendong Zhao | Shumin Deng | Mengru Wang | Bryan Hooi | Nay Oo | Huajun Chen | Ningyu Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Current unlearning methods for large language models usually rely on reverse optimization to reduce target token probabilities. However, this paradigm disrupts the subsequent tokens prediction, degrading model performance and linguistic coherence. Moreover, existing evaluation metrics overemphasize contextual forgetting while inadequately assessing response fluency and relevance. To address these challenges, we propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning, along with a comprehensive evaluation framework. This framework introduces Knowledge Forgetting Ratio (KFR) and Knowledge Retention Ratio (KRR) to measure knowledge-level preservation, and Linguistic Score (LS) to evaluate generation quality. Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality outputs. Through mechanistic analysis, we further demonstrate how reverse optimization disrupts coherent text generation, while ReLearn preserves this essential capability.
ZJUKLAB at SemEval-2025 Task 4: Unlearning via Model Merging.
Haoming Xu | Shuxun Wang | Yanqiu Zhao | Yi Zhong | Ziyan Jiang | Ningyuan Zhao | Shumin Deng | Huajun Chen | Ningyu Zhang
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Haoming Xu | Shuxun Wang | Yanqiu Zhao | Yi Zhong | Ziyan Jiang | Ningyuan Zhao | Shumin Deng | Huajun Chen | Ningyu Zhang
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
This paper presents the ZJUKLAB team’s submission for {emph{SemEval-2025 Task 4: Unlearning Sensitive Content from Large Language Models}}. This task aims to selectively erase sensitive knowledge from large language models, avoiding both over-forgetting and under-forgetting issues. We propose an unlearning system that leverages Model Merging (specifically TIES-Merging), combining two specialized models into a more balanced unlearned model.Our system achieves competitive results, ranking {textbf{second among 26 teams}}, with an online score of 0.944 for Task Aggregate and 0.487 for overall Aggregate. In this paper, we also conduct local experiments and perform a comprehensive analysis of the unlearning process, examining performance trajectories, loss dynamics, and weight perspectives, along with several supplementary experiments, to understand the effectiveness of our method.Furthermore, we analyze the shortcomings of our method and evaluation metrics, emphasizing that MIA scores and ROUGE-based metrics alone are insufficient to fully evaluate successful unlearning. Finally, we emphasize the need for more comprehensive evaluation methodologies and rethinking of unlearning objectives in future research.
EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models
Ziwen Xu | Shuxun Wang | Kewei Xu | Haoming Xu | Mengru Wang | Xinle Deng | Yunzhi Yao | Guozhou Zheng | Huajun Chen | Ningyu Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Ziwen Xu | Shuxun Wang | Kewei Xu | Haoming Xu | Mengru Wang | Xinle Deng | Yunzhi Yao | Guozhou Zheng | Huajun Chen | Ningyu Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
In this paper, we introduce EasyEdit2, a framework designed to enable plug-and-play adjustability for controlling Large Language Model (LLM) behaviors. EasyEdit2 supports a wide range of test-time interventions, including safety, sentiment, personality, reasoning patterns, factuality, and language features. Unlike its predecessor, EasyEdit2 features a new architecture specifically designed for seamless model steering. It comprises key modules such as the steering vector generator and the steering vector applier, which enable automatic generation and application of steering vectors to influence the model’s behavior without modifying its parameters. One of the main advantages of EasyEdit2 is its ease of use—users do not need extensive technical knowledge. With just a single example, they can effectively guide and adjust the model’s responses, making precise control both accessible and efficient. Empirically, we report model steering performance across different LLMs, demonstrating the effectiveness of these techniques. We have released the source code on https://github.com/zjunlp/EasyEdit along with a demonstration notebook. In addition, we provide an online system at http://easyedit.zjukg.cn/for real-time model steering, and a demo video at https://www.youtube.com/watch?v=AkfoiPfp5rQ.