Zeyu Cui


2025

pdf bib
Qwen2.5-xCoder: Multi-Agent Collaboration for Multilingual Code Instruction Tuning
Jian Yang | Wei Zhang | Yibo Miao | Shanghaoran Quan | Zhenhe Wu | Qiyao Peng | Liqun Yang | Tianyu Liu | Zeyu Cui | Binyuan Hui | Junyang Lin
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent advancement in code understanding and generation demonstrates that code LLMs fine-tuned on a high-quality instruction dataset can gain powerful capabilities to address wide-ranging code-related tasks. However, most previous existing methods mainly view each programming language in isolation and ignore the knowledge transfer among different programming languages. To bridge the gap among different programming languages, we introduce a novel multi-agent collaboration framework to enhance multilingual instruction tuning for code LLMs, where multiple language-specific intelligent agent components with generation memory work together to transfer knowledge from one language to another efficiently and effectively. Specifically, we first generate the language-specific instruction data from the code snippets and then provide the generated data as the seed data for language-specific agents. Multiple language-specific agents discuss and collaborate to formulate a new instruction and its corresponding solution (A new programming language or existing programming language), To further encourage the cross-lingual transfer, each agent stores its generation history as memory and then summarizes its merits and faults. Finally, the high-quality multilingual instruction data is used to encourage knowledge transfer among different programming languages to train Qwen2.5-xCoder. Experimental results on multilingual programming benchmarks demonstrate the superior performance of Qwen2.5-xCoder in sharing common knowledge, highlighting its potential to reduce the cross-lingual gap.

pdf bib
CodeArena: Evaluating and Aligning CodeLLMs on Human Preference
Jian Yang | Jiaxi Yang | Wei Zhang | Jin Ke | Yibo Miao | Lei Zhang | Liqun Yang | Zeyu Cui | Yichang Zhang | Zhoujun Li | Binyuan Hui | Junyang Lin
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

We present CodeArena to emulate the complexity/diversity of real-world coding tasks, spanning 40 categories and 44 PLs. A 20B diverse synthetic instruction corpus is created by scaling instructions to help Qwen2.5-SynCoder achieve SOTA performance. Abstract: Code large language models (codeLLMs) have made significant strides in code generation. Most previous code-related benchmarks, which consist of various programming exercises along with the corresponding test cases, are used as a common measure to evaluate the performance and capabilities of code LLMs. However, the current code LLMs focus on synthesizing the correct code snippet, ignoring the alignment with human preferences, where the query should be sampled from the practical application scenarios and the model-generated responses should satisfy the human preference. To bridge the gap between the model-generated response and human preference, we present a rigorous human-curated benchmark CodeArena to emulate the complexity and diversity of real-world coding tasks, where 397 high-quality samples spanning 40 categories and 44 programming languages, carefully curated from user queries. Further, we propose a diverse synthetic instruction corpus SynCode-Instruct (nearly 20B tokens) by scaling instructions from the website to verify the effectiveness of the large-scale synthetic instruction fine-tuning, where Qwen2.5-SynCoder totally trained on synthetic instruction data can achieve top-tier performance of open-source code LLMs. The results find performance differences between execution-based benchmarks and CodeArena. Our systematic experiments of CodeArena on 40+ LLMs reveal a notable performance gap between open SOTA code LLMs (e.g. Qwen2.5-Coder) and proprietary LLMs (e.g., OpenAI o1), underscoring the importance of the human preference alignment.

2020

pdf bib
Every Document Owns Its Structure: Inductive Text Classification via Graph Neural Networks
Yufeng Zhang | Xueli Yu | Zeyu Cui | Shu Wu | Zhongzhen Wen | Liang Wang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Text classification is fundamental in natural language processing (NLP) and Graph Neural Networks (GNN) are recently applied in this task. However, the existing graph-based works can neither capture the contextual word relationships within each document nor fulfil the inductive learning of new words. Therefore in this work, to overcome such problems, we propose TextING for inductive text classification via GNN. We first build individual graphs for each document and then use GNN to learn the fine-grained word representations based on their local structure, which can also effectively produce embeddings for unseen words in the new document. Finally, the word nodes are aggregated as the document embedding. Extensive experiments on four benchmark datasets show that our method outperforms state-of-the-art text classification methods.