Ziming Wang
Other people with similar names: Ziming Wang
Unverified author pages with similar names: Ziming Wang
2025
Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning
Shaobo Wang | Xiangqi Jin | Ziming Wang | Jize Wang | Jiajun Zhang | Kaixin Li | Zichen Wen | Zhong Li | Conghui He | Xuming Hu | Linfeng Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shaobo Wang | Xiangqi Jin | Ziming Wang | Jize Wang | Jiajun Zhang | Kaixin Li | Zichen Wen | Zhong Li | Conghui He | Xuming Hu | Linfeng Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Fine-tuning large language models (LLMs) on task-specific data is essential for their effective deployment. As dataset sizes grow, efficiently selecting optimal subsets for training becomes crucial to balancing performance and computational costs. Traditional data selection methods often require fine-tuning a scoring model on the target dataset, which is time-consuming and resource-intensive, or rely on heuristics that fail to fully leverage the model’s predictive capabilities. To address these challenges, we propose Data Whisperer, an efficient, training-free, attention-based method that leverages few-shot in-context learning with the model to be fine-tuned. Comprehensive evaluations were conducted on both raw and synthetic datasets across diverse tasks and models. Notably, Data Whisperer achieves superior performance compared to the full GSM8K dataset on the Llama-3-8B-Instruct model, using just 10% of the data, and outperforms existing methods with a 3.1-point improvement and a 7.4× speedup.
FanChuan: A Multilingual and Graph-Structured Benchmark For Parody Detection and Analysis
Yilun Zheng | Sha Li | Fangkun Wu | Yang Ziyi | Lin Hongchao | Zhichao Hu | Cai Xinjun | Ziming Wang | Jinxuan Chen | Sitao Luan | Jiahao Xu | Lihui Chen
Findings of the Association for Computational Linguistics: ACL 2025
Yilun Zheng | Sha Li | Fangkun Wu | Yang Ziyi | Lin Hongchao | Zhichao Hu | Cai Xinjun | Ziming Wang | Jinxuan Chen | Sitao Luan | Jiahao Xu | Lihui Chen
Findings of the Association for Computational Linguistics: ACL 2025
Parody is an emerging phenomenon on social media, where individuals imitate a role or position opposite to their own, often for humor, provocation, or controversy. Detecting and analyzing parody can be challenging and is often reliant on context, yet it plays a crucial role in understanding cultural values, promoting subcultures, and enhancing self-expression. However, the study of parody is hindered by limited available data and deficient diversity in current datasets. To bridge this gap, we built seven parody datasets from both English and Chinese corpora, with 14,755 annotated users and 21,210 annotated comments in total. To provide sufficient context information, we also collect replies and construct user-interaction graphs to provide richer contextual information, which is lacking in existing datasets. With these datasets, we test traditional methods and Large Language Models (LLMs) on three key tasks: (1) parody detection, (2) comment sentiment analysis with parody, and (3) user sentiment analysis with parody. Our extensive experiments reveal that parody-related tasks still remain challenging for all models, and contextual information plays a critical role. Interestingly, we find that, in certain scenarios, traditional sentence embedding methods combined with simple classifiers can outperform advanced LLMs, i.e. DeepSeek-R1 and GPT-o3, highlighting parody as a significant challenge for LLMs.