Haiwen Hong
2026
Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics
Ziwen Xu | Chenyan WU | Hengyu Sun | Haiwen Hong | Mengru Wang | Yunzhi Yao | Longtao Huang | Hui Xue | Shumin Deng | Zhixuan Chu | Huajun Chen | Ningyu Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ziwen Xu | Chenyan WU | Hengyu Sun | Haiwen Hong | Mengru Wang | Yunzhi Yao | Longtao Huang | Hui Xue | Shumin Deng | Zhixuan Chu | Huajun Chen | Ningyu Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Methods for controlling large language models (LLMs), including local weight fine-tuning, LoRA-based adaptation, and activation-based interventions, are often studied in isolation, obscuring their connections and making comparison difficult. In this work, we present a unified view that frames these interventions as dynamic weight updates induced by a control signal, placing them within a single conceptual framework. Building on this view, we propose a unified preference-utility analysis that separates control effects into preference, defined as the tendency toward a target concept, and utility, defined as coherent and task-valid generation, and measures both on a shared log-odds scale using polarity-paired contrastive examples. Across methods, we observe a consistent trade-off between preference and utility: stronger control increases preference while predictably reducing utility. We further explain this behavior through an activation manifold perspective, in which control shifts representations along target-concept directions to enhance preference, while utility declines primarily when interventions push representations off the model’s valid-generation manifold. Finally, we introduce a new steering approach guided by this analysis that improves preference while better preserving utility.
How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities
Ziwen Xu | Kewei Xu | Haoming Xu | Haiwen Hong | Longtao Huang | Hui Xue | Ningyu Zhang | Yongliang Shen | Guozhou Zheng | Huajun Chen | Shumin Deng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ziwen Xu | Kewei Xu | Haoming Xu | Haiwen Hong | Longtao Huang | Hui Xue | Ningyu Zhang | Yongliang Shen | Guozhou Zheng | Huajun Chen | Shumin Deng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) are increasingly deployed in socially sensitive domains, yet their unpredictable behaviors, ranging from misaligned intent to inconsistent personality, pose significant risks. We introduce SteerEval, a hierarchical benchmark for evaluating LLM controllability across three domains: language features, sentiment, and personality. Each domain is structured into three specification levels: L1 (what to express), L2 (how to express), and L3 (how to instantiate), connecting high-level behavioral intent to concrete textual output. Using SteerBench, we systematically evaluate contemporary steering methods, revealing that control often degrades at finer-grained levels. Our benchmark offers a principled and interpretable framework for safe and controllable LLM behavior, serving as a foundation for future research.
Seeing but Not Thinking: Routing Distraction in Multimodal Mixture-of-Experts
Haolei Xu | Haiwen Hong | Hongxing Li | Rui Zhou | Yang Zhang | Longtao Huang | Hui Xue | Yongliang Shen | Weiming Lu | Yueting Zhuang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Haolei Xu | Haiwen Hong | Hongxing Li | Rui Zhou | Yang Zhang | Longtao Huang | Hui Xue | Yongliang Shen | Weiming Lu | Yueting Zhuang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multimodal Mixture-of-Experts (MoE) models have achieved remarkable performance on vision-language tasks. However, we identify a puzzling phenomenon termed Seeing but Not Thinking: models accurately perceive image content yet fail in subsequent reasoning, while correctly solving identical problems presented as pure text. Through systematic analysis, we first verify that cross-modal semantic sharing exists in MoE architectures, ruling out semantic alignment failure as the sole explanation. We then reveal that visual experts and domain experts exhibit layer-wise separation, with image inputs inducing significant routing divergence from text inputs in middle layers where domain experts concentrate. Based on these findings, we propose the Routing Distraction hypothesis: when processing visual inputs, the routing mechanism fails to adequately activate task-relevant reasoning experts. To validate this hypothesis, we design a routing-guided intervention method that enhances domain expert activation. Experiments on three multimodal MoE models across six benchmarks demonstrate consistent improvements, with gains of up to 3.17% on complex visual reasoning tasks. Our analysis further reveals that domain expert identification locates cognitive functions rather than sample-specific solutions, enabling effective transfer across tasks with different information structures.
2025
The Right Time Matters: Data Arrangement Affects Zero-Shot Generalization in Instruction Tuning
Bingxiang He | Ning Ding | Cheng Qian | Jia Deng | Ganqu Cui | Lifan Yuan | Haiwen Hong | Huan-ang Gao | Longtao Huang | Hui Xue | Huimin Chen | Zhiyuan Liu | Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2025
Bingxiang He | Ning Ding | Cheng Qian | Jia Deng | Ganqu Cui | Lifan Yuan | Haiwen Hong | Huan-ang Gao | Longtao Huang | Hui Xue | Huimin Chen | Zhiyuan Liu | Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2025
Understanding alignment techniques begins with comprehending zero-shot generalization brought by instruction tuning, but little of the mechanism has been understood. Existing work has largely been confined to the task level, without considering that tasks are artificially defined and, to LLMs, merely consist of tokens and representations. To bridge this gap, we investigate zero-shot generalization from the perspective of the data itself. We first demonstrate that zero-shot generalization happens very early during instruction tuning, with loss serving as a stable indicator. Next, we investigate training data arrangement through similarity and granularity perspectives, confirming that the timing of exposure to certain training examples may greatly facilitate generalization on unseen tasks. Finally, we propose a more grounded training data arrangement framework, Test-centric Multi-turn Arrangement, and show its effectiveness in promoting continual learning and further loss reduction. For the first time, we show that zero-shot generalization during instruction tuning is a form of similarity-based generalization between training and test data at the instance level.
2021
Disentangled Code Representation Learning for Multiple Programming Languages
Jingfeng Zhang | Haiwen Hong | Yin Zhang | Yao Wan | Ye Liu | Yulei Sui
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
Jingfeng Zhang | Haiwen Hong | Yin Zhang | Yao Wan | Ye Liu | Yulei Sui
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
Fix-Filter-Fix: Intuitively Connect Any Models for Effective Bug Fixing
Haiwen Hong | Jingfeng Zhang | Yin Zhang | Yao Wan | Yulei Sui
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Haiwen Hong | Jingfeng Zhang | Yin Zhang | Yao Wan | Yulei Sui
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Locating and fixing bugs is a time-consuming task. Most neural machine translation (NMT) based approaches for automatically bug fixing lack generality and do not make full use of the rich information in the source code. In NMT-based bug fixing, we find some predicted code identical to the input buggy code (called unchanged fix) in NMT-based approaches due to high similarity between buggy and fixed code (e.g., the difference may only appear in one particular line). Obviously, unchanged fix is not the correct fix because it is the same as the buggy code that needs to be fixed. Based on these, we propose an intuitive yet effective general framework (called Fix-Filter-Fix or Fˆ3) for bug fixing. Fˆ3 connects models with our filter mechanism to filter out the last model’s unchanged fix to the next. We propose an Fˆ3 theory that can quantitatively and accurately calculate the Fˆ3 lifting effect. To evaluate, we implement the Seq2Seq Transformer (ST) and the AST2Seq Transformer (AT) to form some basic Fˆ3 instances, called Fˆ3_ST+AT and Fˆ3_AT+ST. Comparing them with single model approaches and many model connection baselines across four datasets validates the effectiveness and generality of Fˆ3 and corroborates our findings and methodology.
Search
Fix author
Co-authors
- Longtao Huang 4
- Hui Xue 4
- Huajun Chen 2
- Shumin Deng 2
- Yongliang Shen 2
- Yulei Sui 2
- Yao Wan 2
- Ziwen Xu 2
- Jingfeng Zhang 2
- Ningyu Zhang 2
- Yin Zhang 2
- Huimin Chen 1
- Zhixuan Chu 1
- Ganqu Cui 1
- Jia Deng 1
- Ning Ding 1
- Huan-ang Gao 1
- Bingxiang He 1
- Hongxing Li 1
- Ye Liu 1
- Zhiyuan Liu 1
- Weiming Lu 1
- Cheng Qian 1
- Hengyu Sun 1
- Maosong Sun (孙茂松) 1
- Chenyan WU 1
- Mengru Wang 1
- Haolei Xu 1
- Haoming Xu 1
- Kewei Xu 1
- Yunzhi Yao 1
- Lifan Yuan 1
- Yang Zhang 1
- Guozhou Zheng 1
- Rui Zhou 1
- Yueting Zhuang 1