Haitian Zhong
2026
MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning
Xukai Wang | Xuanbo Liu | Mingrui Chen | Haitian Zhong | Xuanlin Yang | Bohan Zeng | Jinbo Hu | Hao Liang | Junbo Niu | Xuchen Li | Ruitao Wu | Ruichuan An | Yang Shi | Liu Liu | Qiang Liu | Zhouchen Lin | Xu-Yao Zhang | Wentao Zhang | Bin Dong
Findings of the Association for Computational Linguistics: ACL 2026
Xukai Wang | Xuanbo Liu | Mingrui Chen | Haitian Zhong | Xuanlin Yang | Bohan Zeng | Jinbo Hu | Hao Liang | Junbo Niu | Xuchen Li | Ruitao Wu | Ruichuan An | Yang Shi | Liu Liu | Qiang Liu | Zhouchen Lin | Xu-Yao Zhang | Wentao Zhang | Bin Dong
Findings of the Association for Computational Linguistics: ACL 2026
With the advancement of powerful large-scale reasoning models, effectively evaluating the reasoning capabilities of these models has become increasingly important. However, existing benchmarks designed to assess the reasoning abilities of large models tend to be limited in scope and lack the flexibility to adapt their difficulty according to the evolving reasoning capacities of the models. To address this, we propose MorphoBench, a benchmark that incorporates multidisciplinary questions to evaluate the reasoning capabilities of large models and can adjust and update question difficulty based on the reasoning abilities of advanced models. Specifically, we curate the benchmark by selecting and collecting complex reasoning questions from existing benchmarks and sources such as Olympiad-level competitions. Additionally, MorphoBench adaptively modifies the analytical challenge of questions by leveraging key statements generated during the model’s reasoning process. Furthermore, it includes questions generated using simulation software, enabling dynamic adjustment of benchmark difficulty with minimal resource consumption. We have gathered over 1,300 test questions and iteratively adjusted the difficulty of MorphoBench based on the reasoning capabilities of models such as GPT-5 and Gemini-3-Pro. MorphoBench enhances the comprehensiveness and validity of model reasoning evaluation, providing reliable guidance for improving both the reasoning abilities and scientific robustness of large models.
2025
REACT: Representation Extraction And Controllable Tuning to Overcome Overfitting in LLM Knowledge Editing
Haitian Zhong | Yuhuan Liu | Ziyang Xu | Guofan Liu | Qiang Liu | Shu Wu | Zhe Zhao | Liang Wang | Tieniu Tan
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Haitian Zhong | Yuhuan Liu | Ziyang Xu | Guofan Liu | Qiang Liu | Shu Wu | Zhe Zhao | Liang Wang | Tieniu Tan
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language model editing methods frequently suffer from overfitting, wherein factual updates can propagate beyond their intended scope, overemphasizing the edited target even when it’s contextually inappropriate. To address this challenge, we introduce REACT (Representation Extraction And Controllable Tuning), a unified two-phase framework designed for precise and controllable knowledge editing. In the initial phase, we utilize tailored stimuli to extract latent factual representations and apply Principal Component Analysis with a simple learnbale linear transformation to compute a directional “belief shift” vector for each instance. In the second phase, we apply controllable perturbations to hidden states using the obtained vector with a magnitude scalar, gated by a pre-trained classifier that permits edits only when contextually necessary. Relevant experiments on EVOKE benchmarks demonstrate that REACT significantly reduces overfitting across nearly all evaluation metrics, and experiments on COUNTERFACT and MQuAKE shows that our method preserves balanced basic editing performance (reliability, locality, and generality) under diverse editing scenarios.