Zhang Li
2026
The GaoYao Benchmark: A Comprehensive Framework for Evaluating Multilingual and Multicultural Abilities of Large Language Models
Yilun Liu | Chunguang Zhao | Mengyao Piao | Lingqi Miao | Shimin Tao | Minggui HE | Chenxin Liu | Zhang Li | Mahongxia | Jiaxin Guo | Chen Liu | Liqun Deng | Jiansheng Wei | Xiaojun Meng | Fanyi Du | Daimeng Wei | Yanghua Xiao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yilun Liu | Chunguang Zhao | Mengyao Piao | Lingqi Miao | Shimin Tao | Minggui HE | Chenxin Liu | Zhang Li | Mahongxia | Jiaxin Guo | Chen Liu | Liqun Deng | Jiansheng Wei | Xiaojun Meng | Fanyi Du | Daimeng Wei | Yanghua Xiao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Evaluating the multilingual and multicultural capabilities of Large Language Models (LLMs) is essential for their global utility. However, current benchmarks face three critical limitations: (1) fragmented evaluation dimensions that often neglect deep cultural nuances; (2) insufficient language coverage in subjective tasks relying on low-quality machine translation; and (3) shallow analysis that lacks diagnostic depth beyond simple rankings. To address these, we introduce GaoYao, a comprehensive benchmark with 182.3k samples, 26 languages and 51 nations/areas. First, GaoYao proposes a unified framework categorizing evaluation tasks into three cultural layers (General Multilingual, Cross-cultural, Monocultural) and nine cognitive sub-layers. Second, we achieve native-quality expansion by leveraging experts to rigorously localize subjective benchmarks into 19 languages and synthesizing cross-cultural test sets for 34 cultures, surpassing prior coverage by up to 111%. Third, we conduct an in-depth diagnostic analysis on 20+ flagship and compact LLMs. Our findings reveal significant geographical performance disparities and distinct gaps between tasks, offering a reliable map for future work. We release the benchmark.
2025
Taming Text-to-Image Synthesis for Novices: User-centric Prompt Generation via Multi-turn Guidance
Yilun Liu | Minggui He | Feiyu Yao | Yuhe Ji | Shimin Tao | Jingzhou Du | Justin Li | Jian Gao | Zhang Li | Hao Yang | Boxing Chen | Osamu Yoshie
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yilun Liu | Minggui He | Feiyu Yao | Yuhe Ji | Shimin Tao | Jingzhou Du | Justin Li | Jian Gao | Zhang Li | Hao Yang | Boxing Chen | Osamu Yoshie
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
The emergence of text-to-image synthesis (TIS) models has significantly influenced digital image creation by producing high-quality visuals from written descriptions. Yet these models are sensitive on textual prompts, posing a challenge for novice users who may not be familiar with TIS prompt writing. Existing solutions relieve this via automatic prompt expansion or generation from a user query. However, this single-turn manner suffers from limited user-centricity in terms of result interpretability and user interactivity. Thus, we propose DialPrompt, a dialogue-based TIS prompt generation model that emphasizes user experience for novice users. DialPrompt is designed to follow a multi-turn workflow, where in each round of dialogue the model guides user to express their preferences on possible optimization dimensions before generating the final TIS prompt. To achieve this, we mined 15 essential dimensions for high-quality prompts from advanced users and curated a multi-turn dataset. Through training on this dataset, DialPrompt improves user-centricity by allowing users to perceive and control the creation process of TIS prompts. Experiments indicate that DialPrompt improves significantly in user-centricity score compared with existing approaches while maintaining a competitive quality of synthesized images. In our user evaluation, DialPrompt is highly rated by 19 human reviewers (especially novices).
2024
Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation
Yuan Ge | Yilun Liu | Chi Hu | Weibin Meng | Shimin Tao | Xiaofeng Zhao | Mahong Xia | Zhang Li | Boxing Chen | Hao Yang | Bei Li | Tong Xiao | JingBo Zhu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Yuan Ge | Yilun Liu | Chi Hu | Weibin Meng | Shimin Tao | Xiaofeng Zhao | Mahong Xia | Zhang Li | Boxing Chen | Hao Yang | Bei Li | Tong Xiao | JingBo Zhu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
With contributions from the open-source community, a vast amount of instruction tuning (IT) data has emerged. Given the significant resource allocation required by training and evaluating models, it is advantageous to have an efficient method for selecting high-quality IT data. However, existing methods for instruction data selection have limitations such as relying on fragile external APIs, being affected by biases in GPT models, or reducing the diversity of the selected instruction dataset. In this paper, we propose an industrial-friendly, expert-aligned and diversity-preserved instruction data selection method: Clustering and Ranking (CaR). CaR consists of two steps. The first step involves ranking instruction pairs using a scoring model that is well aligned with expert preferences (achieving an accuracy of 84.25%). The second step involves preserving dataset diversity through a clustering process. In our experiment, CaR selected a subset containing only 1.96% of Alpaca’s IT data, yet the underlying AlpaCaR model trained on this subset outperforms Alpaca by an average of 32.1% in GPT-4 evaluations. Furthermore, our method utilizes small models (550M parameters) and requires only 11.2% of the monetary cost compared to existing methods, making it easily deployable in industrial scenarios.
2007
Search
Fix author
Co-authors
- Yilun Liu 3
- Shimin Tao 3
- Boxing Chen 2
- Minggui He 2
- Hao Yang 2
- Liqun Deng 1
- Fanyi Du 1
- Jingzhou Du 1
- Ren Feiliang 1
- Jian Gao 1
- Yuan Ge 1
- Jiaxin Guo 1
- Chi Hu 1
- Yuhe Ji 1
- Bei Li 1
- Justin Li 1
- Chen Liu 1
- Chenxin Liu 1
- Mahongxia 1
- Weibin Meng 1
- Xiaojun Meng 1
- Lingqi Miao 1
- Hu Minghan 1
- Mengyao Piao 1
- Yao Tianshun 1
- Daimeng Wei 1
- Jiansheng Wei 1
- Mahong Xia 1
- Tong Xiao (肖桐) 1
- Yanghua Xiao 1
- Feiyu Yao 1
- Osamu Yoshie 1
- Chunguang Zhao 1
- Xiaofeng Zhao 1
- JingBo Zhu (朱靖波) 1