Minghao Liu
2026
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values
Siwei Wu | JinCheng Ren | Xeron Du | Shuyue Guo | Xingwei Qu | Yiming Liang | Jie Liu | Yunwen Li | Tyler Loakman | Tianyu Zheng | Boyu Feng | Huaqing Yuan | Zili Wang | Jiaheng Liu | Wenhao Huang | Chenglin Cai | Haoran Que | Jian Yang | Yuelin Bai | Zekun Moore Wang | Zhouliang Yu | Qunshu Lin | Ding Pan | Yuchen Eleanor Jiang | Tiannan Wang | Wangchunshu Zhou | Shenzhi Wang | Xingyuan Bu | Minghao Liu | Guoyin Wang | Ge Zhang | Chenghua Lin
Findings of the Association for Computational Linguistics: EACL 2026
Siwei Wu | JinCheng Ren | Xeron Du | Shuyue Guo | Xingwei Qu | Yiming Liang | Jie Liu | Yunwen Li | Tyler Loakman | Tianyu Zheng | Boyu Feng | Huaqing Yuan | Zili Wang | Jiaheng Liu | Wenhao Huang | Chenglin Cai | Haoran Que | Jian Yang | Yuelin Bai | Zekun Moore Wang | Zhouliang Yu | Qunshu Lin | Ding Pan | Yuchen Eleanor Jiang | Tiannan Wang | Wangchunshu Zhou | Shenzhi Wang | Xingyuan Bu | Minghao Liu | Guoyin Wang | Ge Zhang | Chenghua Lin
Findings of the Association for Computational Linguistics: EACL 2026
Existing Chinese preference datasets suffer from limited scale, restricted domain coverage, and insufficiently rigorous data validation. Human annotation significantly limits the scalability of human preference datasets. As a result, Chinese Alignment and Chinese Reward Models (CRM) have not yet been thoroughly explored. To address these challenges, we design an LLM-based data annotation pipeline with no human intervention. Based on this pipeline, we curate COIG-P (Chinese Open Instruction Generalist - Preference), a high-quality, large-scale Chinese preference dataset consisting of 1M Chinese preference pairs and 92k carefully curated Chinese queries across diverse domains, including Chat, Coding, Maths, and others. We conduct experiments to verify the quality of COIG-P from two perspectives. (1) COIG-P brings significant performance improvements for the Qwen2/2.5 and Infinity-Instruct model series on AlignBench through DPO, with gains ranging from 2% to 12%. Furthermore, it significantly outperforms other existing Chinese preference datasets. (2) We train an 8B-sized CRM and manually annotate a Chinese Reward Benchmark (CRBench). Our CRM demonstrates robust scoring ability on CRBench. In addition, in practical data construction experiments, the quality of the data constructed by our CRM is comparable to that produced by GPT-4o.
2025
ConstraintLLM: A Neuro-Symbolic Framework for Industrial-Level Constraint Programming
Weichun Shi | Minghao Liu | Wanting Zhang | Langchen Shi | Fuqi Jia | Feifei Ma | Jian Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Weichun Shi | Minghao Liu | Wanting Zhang | Langchen Shi | Fuqi Jia | Feifei Ma | Jian Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Constraint programming (CP) is a crucial technology for solving real-world constraint optimization problems (COPs), with the advantages of rich modeling semantics and high solving efficiency. Using large language models (LLMs) to generate formal modeling automatically for COPs is becoming a promising approach, which aims to build trustworthy neuro-symbolic AI with the help of symbolic solvers. However, CP has received less attention compared to works based on operations research (OR) models. We introduce ConstraintLLM, the first LLM specifically designed for CP modeling, which is trained on an open-source LLM with multi-instruction supervised fine-tuning. We propose the Constraint-Aware Retrieval Module (CARM) to increase the in-context learning capabilities, which is integrated in a Tree-of-Thoughts (ToT) framework with guided self-correction mechanism. Moreover, we construct and release IndusCP, the first industrial-level benchmark for CP modeling, which contains 140 challenging tasks from various domains. Our experiments demonstrate that ConstraintLLM achieves state-of-the-art solving accuracy across multiple benchmarks and outperforms the baselines by 2x on the new IndusCP benchmark. Code and data are available at: https://github.com/william4s/ConstraintLLM.
MedEBench: Diagnosing Reliability in Text-Guided Medical Image Editing
Minghao Liu | Zhitao He | Zhiyuan Fan | Qingyun Wang | Yi R. Fung
Findings of the Association for Computational Linguistics: EMNLP 2025
Minghao Liu | Zhitao He | Zhiyuan Fan | Qingyun Wang | Yi R. Fung
Findings of the Association for Computational Linguistics: EMNLP 2025
Text-guided image editing has seen significant progress in natural image domains, but its application in medical imaging remains limited and lacks standardized evaluation frameworks. Such editing could revolutionize clinical practices by enabling personalized surgical planning, enhancing medical education, and improving patient communication. To bridge this gap, we introduce MedEBench, a robust benchmark designed to diagnose reliability in text-guided medical image editing. MedEBench consists of 1,182 clinically curated image-prompt pairs covering 70 distinct editing tasks and 13 anatomical regions. It contributes in three key areas: (1) a clinically grounded evaluation framework that measures Editing Accuracy, Context Preservation, and Visual Quality, complemented by detailed descriptions of intended edits and corresponding Region-of-Interest (ROI) masks; (2) a comprehensive comparison of seven state-of-the-art models, revealing consistent patterns of failure; and (3) a diagnostic error analysis technique that leverages attention alignment, using Intersection-over-Union (IoU) between model attention maps and ROI masks to identify mislocalization issues, where models erroneously focus on incorrect anatomical regions. MedEBench sets the stage for developing more reliable and clinically effective text-guided medical image editing tools.
2024
IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce
Wenxuan Ding | Weiqi Wang | Sze Heng Douglas Kwok | Minghao Liu | Tianqing Fang | Jiaxin Bai | Xin Liu | Changlong Yu | Zheng Li | Chen Luo | Qingyu Yin | Bing Yin | Junxian He | Yangqiu Song
Findings of the Association for Computational Linguistics: EMNLP 2024
Wenxuan Ding | Weiqi Wang | Sze Heng Douglas Kwok | Minghao Liu | Tianqing Fang | Jiaxin Bai | Xin Liu | Changlong Yu | Zheng Li | Chen Luo | Qingyu Yin | Bing Yin | Junxian He | Yangqiu Song
Findings of the Association for Computational Linguistics: EMNLP 2024
Enhancing Language Models’ (LMs) ability to understand purchase intentions in E-commerce scenarios is crucial for their effective assistance in various downstream tasks. However, previous approaches that distill intentions from LMs often fail to generate meaningful and human-centric intentions applicable in real-world E-commerce contexts. This raises concerns about the true comprehension and utilization of purchase intentions by LMs. In this paper, we present IntentionQA, a double-task multiple-choice question answering benchmark to evaluate LMs’ comprehension of purchase intentions in E-commerce. Specifically, LMs are tasked to infer intentions based on purchased products and utilize them to predict additional purchases. IntentionQA consists of 4,360 carefully curated problems across three difficulty levels, constructed using an automated pipeline to ensure scalability on large E-commerce platforms. Human evaluations demonstrate the high quality and low false-negative rate of our benchmark. Extensive experiments across 19 language models show that they still struggle with certain scenarios, such as understanding products and intentions accurately, jointly reasoning with products and intentions, and more, in which they fall far behind human performances.
Search
Fix author
Co-authors
- Jiaxin Bai 1
- Yuelin Bai 1
- Xingyuan Bu 1
- Chenglin Cai 1
- Wenxuan Ding 1
- Xeron Du 1
- Zhiyuan Fan 1
- Tianqing Fang 1
- Boyu Feng 1
- Yi R. Fung 1
- Shuyue Guo 1
- Junxian He 1
- Zhitao He 1
- Wenhao Huang 1
- Fuqi Jia 1
- Yuchen Eleanor Jiang 1
- Sze Heng Douglas Kwok 1
- Zheng Li 1
- Yunwen Li 1
- Yiming Liang 1
- Qunshu Lin 1
- Chenghua Lin 1
- Xin Liu 1
- Jie Liu 1
- Jiaheng Liu 1
- Tyler Loakman 1
- Chen Luo 1
- Feifei Ma 1
- Ding Pan 1
- Xingwei Qu 1
- Haoran Que 1
- JinCheng Ren 1
- Weichun Shi 1
- Langchen Shi 1
- Yangqiu Song 1
- Weiqi Wang 1
- Qingyun Wang 1
- Zili Wang 1
- Zekun Moore Wang 1
- Tiannan Wang 1
- Shenzhi Wang 1
- Guoyin Wang 1
- Siwei Wu 1
- Jian Yang 1
- Qingyu Yin 1
- Bing Yin 1
- Changlong Yu 1
- Zhouliang Yu 1
- Huaqing Yuan 1
- Wanting Zhang 1
- Jian Zhang 1
- Ge Zhang 1
- Tianyu Zheng 1
- Wangchunshu Zhou 1