Haonan Zhang
Other people with similar names: Haonan Zhang
2026
Act-Adaptive Margin: Dynamically Calibrating Reward Models for Subjective Ambiguity
Feiteng Fang | Dingwei Chen | Xiang Huang | Ting-En Lin | Yuchuan Wu | Xiong Liu | Jing Ye | Ziqiang Liu | Haonan Zhang | Liang Zhu | Hamid Alinejad-Rokny | Min Yang | Yongbin Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Feiteng Fang | Dingwei Chen | Xiang Huang | Ting-En Lin | Yuchuan Wu | Xiong Liu | Jing Ye | Ziqiang Liu | Haonan Zhang | Liang Zhu | Hamid Alinejad-Rokny | Min Yang | Yongbin Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Currently, most reinforcement learning tasks focus on domains like mathematics and programming, where verification is relatively straightforward. However, in subjective tasks such as role-playing, alignment techniques struggle to make progress, primarily because subjective reward modeling using the Bradley-Terry model faces significant challenges when dealing with ambiguous preferences. To improve reward modeling in subjective tasks, this paper proposes AAM (Act-Adaptive Margin), which enhances reward modeling by dynamically calibrating preference margins using the model’s internal parameter knowledge. We design two versions of AAM that efficiently generate contextually-appropriate preference gaps without additional human annotation. This approach fundamentally improves how reward models handle subjective rewards by better integrating generative understanding with preference scoring. To validate AAM’s effectiveness in subjective reward modeling, we conduct evaluations on RewardBench, JudgeBench, and challenging role-playing tasks. Results show that AAM significantly improves subjective reward modeling performance, enhancing Bradley-Terry reward models by 2.95% in general tasks and 4.85% in subjective role-playing tasks. Furthermore, reward models trained with AAM can help downstream alignment tasks achieve better results. Our test results show that applying rewards generated by AAM-Augmented RM to preference learning techniques (e.g., GRPO) achieves state-of-the-art results on CharacterEval and Charm. The code and dataset will be released upon acceptance.
2025
OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
Haonan Zhang | Run Luo | Xiong Liu | Yuchuan Wu | Ting-En Lin | Pengpeng Zeng | Qiang Qu | Feiteng Fang | Min Yang | Lianli Gao | Jingkuan Song | Fei Huang | Yongbin Li
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Haonan Zhang | Run Luo | Xiong Liu | Yuchuan Wu | Ting-En Lin | Pengpeng Zeng | Qiang Qu | Feiteng Fang | Min Yang | Lianli Gao | Jingkuan Song | Fei Huang | Yongbin Li
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Role-Playing Agents (RPAs), benefiting from large language models, is an emerging interactive AI system that simulates roles or characters with diverse personalities. However, existing methods primarily focus on mimicking dialogues among roles in textual form, neglecting the role’s voice traits (e.g., voice style and emotions) as playing a crucial effect in interaction, which tends to be more immersive experiences in realistic scenarios. Towards this goal, we propose OmniCharacter, a first seamless speech-language personality interaction model to achieve immersive RPAs with low latency. Specifically, OmniCharacter enables agents to consistently exhibit role-specific personality traits and vocal traits throughout the interaction, enabling a mixture of speech and language responses. To align the model with speech-language scenarios, we construct a dataset named OmniCharacter-10K, which involves more distinctive characters (20), richly contextualized multi-round dialogue (10K), and dynamic speech response (135K). Experimental results showcase that our method yields better responses in terms of both content and style compared to existing RPAs and mainstream speech-language models, with a response latency as low as 289ms.
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct
Run Luo | Haonan Zhang | Longze Chen | Ting-En Lin | Xiong Liu | Yuchuan Wu | Min Yang | Yongbin Li | Minzheng Wang | Pengpeng Zeng | Lianli Gao | Heng Tao Shen | Yunshui Li | Hamid Alinejad-Rokny | Xiaobo Xia | Jingkuan Song | Fei Huang
Findings of the Association for Computational Linguistics: ACL 2025
Run Luo | Haonan Zhang | Longze Chen | Ting-En Lin | Xiong Liu | Yuchuan Wu | Min Yang | Yongbin Li | Minzheng Wang | Pengpeng Zeng | Lianli Gao | Heng Tao Shen | Yunshui Li | Hamid Alinejad-Rokny | Xiaobo Xia | Jingkuan Song | Fei Huang
Findings of the Association for Computational Linguistics: ACL 2025
The development of Multimodal Large Language Models (MLLMs) has seen significant progress, driven by increasing demands across various fields (e.g., multimodal agents, embodied intelligence). While model-driven approaches aim to enhance MLLM capabilities through diverse architectures, their performance gains have become increasingly marginal. In contrast, data-driven methods, which scale up image-text instruction datasets, have proven more effective but face challenges related to limited data diversity and complexity. The absence of high-quality instruction data remains a major bottleneck in MLLM development. To address this issue, we propose , a novel multimodal instruction data evolution framework. This framework iteratively enhances data quality through a refined combination of fine-grained perception, cognitive reasoning, and interaction evolution, generating a more complex and diverse image-text instruction dataset that significantly improves MLLM capabilities. Starting with an initial dataset, SEED-163K, we employ to systematically expand instruction diversity, extend visual reasoning steps to improve cognitive abilities, and extract fine-grained visual details to enhance understanding and robustness. To rigorously evaluate our approach, we conduct extensive qualitative analysis and quantitative experiments across 13 vision-language tasks. Compared to baseline models trained on the original seed dataset, our method achieves an average accuracy improvement of 3.1 percentage points. Moreover, our approach attains state-of-the-art (SOTA) performance in nine tasks while using significantly less data than existing state-of-the-art models.