Haihong Wu
2025
COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning
Yuelin Bai
|
Xeron Du
|
Yiming Liang
|
Leo Jin
|
Junting Zhou
|
Ziqiang Liu
|
Feiteng Fang
|
Mingshan Chang
|
Tianyu Zheng
|
Xincheng Zhang
|
Nuo Ma
|
Zekun Moore Wang
|
Ruibin Yuan
|
Haihong Wu
|
Hongquan Lin
|
Wenhao Huang
|
Jiajun Zhang
|
Chenghua Lin
|
Jie Fu
|
Min Yang
|
Shiwen Ni
|
Ge Zhang
Findings of the Association for Computational Linguistics: NAACL 2025
Remarkable progress on large language models (LLMs), particularly in English, has facilitated impressive capabilities in following human instructions. However, there remains a noticeable gap in instruction fine-tuning for Chinese, where the complex linguistic features pose significant challenges. Existing datasets, generally distilled from English-centric LLMs, are not well-aligned with Chinese users’ interaction patterns. To bridge this gap, we introduce COIG-CQIA, a new Chinese instruction tuning dataset derived from various real-world data resources and undergoing comprehensive human verification. We conduct extensive experiments on COIG-CQIA, and compare them with strong baseline models and datasets. The experimental results show that models trained on COIG-CQIA achieve highly competitive performance in diverse benchmarks. Additionally, our findings offer several insights for designing effective Chinese instruction-tuning datasets and data mixing strategies. Our dataset are available at https://huggingface.co/datasets/m-a-p/COIG-CQIA.
2024
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language Models
Jinchang Hou
|
Chang Ao
|
Haihong Wu
|
Xiangtao Kong
|
Zhigang Zheng
|
Daijia Tang
|
Chengming Li
|
Xiping Hu
|
Ruifeng Xu
|
Shiwen Ni
|
Min Yang
Findings of the Association for Computational Linguistics: ACL 2024
The rapid development of Large Language Models (LLMs) has led to their increasing utilization in Chinese K-12 education. Despite the growing integration of LLMs and education, the absence of a dedicated benchmark for evaluating LLMs within this domain presents a pressing concern. Consequently, there is an urgent need for a comprehensive natural language processing benchmark to precisely assess the capabilities of various LLMs in Chinese K-12 education. In response, we introduce E-EVAL, the first comprehensive evaluation benchmark specifically tailored for Chinese K-12 education. E-EVAL comprises 4,351 multiple-choice questions spanning primary, middle, and high school levels, covering a diverse array of subjects. Through meticulous evaluation, we find that Chinese-dominant models often outperform English-dominant ones, with many exceeding GPT 4.0. However, most struggle with complex subjects like mathematics. Additionally, our analysis indicates that most Chinese-dominant LLMs do not achieve higher scores at the primary school level compared to the middle school level, highlighting the nuanced relationship between proficiency in higher-order and lower-order knowledge domains. Furthermore, experimental results highlight the effectiveness of the Chain of Thought (CoT) technique in scientific subjects and Few-shot prompting in liberal arts. Through E-EVAL, we aim to conduct a rigorous analysis delineating the strengths and limitations of LLMs in educational applications, thereby contributing significantly to the advancement of Chinese K-12 education and LLMs.
Search
Fix data
Co-authors
- Shiwen Ni 2
- Min Yang 2
- Chang Ao 1
- Yuelin Bai 1
- Mingshan Chang 1
- show all...
- Xeron Du 1
- Feiteng Fang 1
- Jie Fu 1
- Jinchang Hou 1
- Xiping Hu 1
- Wenhao Huang 1
- Leo Jin 1
- Xiangtao Kong 1
- Chengming Li 1
- Yiming Liang 1
- Hongquan Lin 1
- Chenghua Lin 1
- Ziqiang Liu 1
- Nuo Ma 1
- Daijia Tang 1
- Zekun Moore Wang 1
- Ruifeng Xu (徐睿峰) 1
- Ruibin Yuan 1
- Xincheng Zhang 1
- Jiajun Zhang 1
- Ge Zhang 1
- Zhigang Zheng 1
- Tianyu Zheng 1
- Junting Zhou 1