Zhengyu Hu
2026
CoAct: Co-Active LLM Preference Learning with Human-AI Synergy
Ruiyao Xu | Mihir Parmar | Tiankai Yang | Zhengyu Hu | Yue Zhao | Kaize Ding
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ruiyao Xu | Mihir Parmar | Tiankai Yang | Zhengyu Hu | Yue Zhao | Kaize Ding
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Learning from preference-based feedback has become an effective approach for aligning LLMs across diverse tasks. However, high-quality human-annotated preference data remains expensive and scarce. Existing methods address this challenge through either self-rewarding, which scales by using purely AI-generated labels but risks unreliability, or active learning, which ensures quality through oracle annotation but cannot fully leverage unlabeled data. In this paper, we present CoAct, a novel framework that synergistically combines self-rewarding and active learning through strategic human-AI collaboration. CoAct leverages self-consistency to identify both reliable self-labeled data and samples requiring oracle verification. Additionally, oracle feedback guides the model to generate new instructions within its solvable capability. Evaluated on three reasoning benchmarks across two model families, CoAct achieves average improvements of +13.25% on GSM8K, +8.19% on MATH, and +13.16% on WebInstruct, consistently outperforming all baselines.
2025
Explaining Length Bias in LLM-Based Preference Evaluations
Zhengyu Hu | Linxin Song | Jieyu Zhang | Zheyuan Xiao | Tianfu Wang | Zhengyu Chen | Nicholas Jing Yuan | Jianxun Lian | Kaize Ding | Hui Xiong
Findings of the Association for Computational Linguistics: EMNLP 2025
Zhengyu Hu | Linxin Song | Jieyu Zhang | Zheyuan Xiao | Tianfu Wang | Zhengyu Chen | Nicholas Jing Yuan | Jianxun Lian | Kaize Ding | Hui Xiong
Findings of the Association for Computational Linguistics: EMNLP 2025
The use of large language models (LLMs) as judges, particularly in preference comparisons, has become widespread, but this reveals a notable bias towards longer responses, undermining the reliability of such evaluations. To better understand such bias, we propose to decompose the preference evaluation metric, specifically the win rate, into two key components: desirability and information mass, where the former is length-independent and related to trustworthiness such as correctness, toxicity, and consistency, and the latter is length-dependent and represents the amount of information in the response. We empirically demonstrated the decomposition through controlled experiments and found that response length impacts evaluations by influencing information mass. To derive a reliable evaluation metric that assesses content quality without being confounded by response length, we propose AdapAlpaca, a simple yet effective adjustment to win rate measurement. Specifically, AdapAlpaca ensures a fair comparison of response quality by aligning the lengths of reference and test model responses under equivalent length intervals.
2024
Let’s Ask GNN: Empowering Large Language Model for Graph In-Context Learning
Zhengyu Hu | Yichuan Li | Zhengyu Chen | Jingang Wang | Han Liu | Kyumin Lee | Kaize Ding
Findings of the Association for Computational Linguistics: EMNLP 2024
Zhengyu Hu | Yichuan Li | Zhengyu Chen | Jingang Wang | Han Liu | Kyumin Lee | Kaize Ding
Findings of the Association for Computational Linguistics: EMNLP 2024
Textual Attributed Graphs (TAGs) are crucial for modeling complex real-world systems, yet leveraging large language models (LLMs) for TAGs presents unique challenges due to the gap between sequential text processing and graph-structured data. We introduce AskGNN, a novel approach that bridges this gap by leveraging In-Context Learning (ICL) to integrate graph data and task-specific information into LLMs. AskGNN employs a Graph Neural Network (GNN)-powered structure-enhanced retriever to select labeled nodes across graphs, incorporating complex graph structures and their supervision signals. Our learning-to-retrieve algorithm optimizes the retriever to select example nodes that maximize LLM performance on graph. Experiments across three tasks and seven LLMs demonstrate AskGNN’s superior effectiveness in graph task performance, opening new avenues for applying LLMs to graph-structured data without extensive fine-tuning.