Model Performance-Guided Evaluation Data Selection for Effective Prompt Optimization

Ximing Dong, Shaowei Wang, Dayi Lin, Ahmed Hassan


Abstract
Optimizing Large Language Model (LLM) performance requires well-crafted prompts, but manual prompt engineering is labor-intensive and often ineffective. Automated prompt optimization techniques address this challenge but the major of them rely on randomly selected evaluation subsets, which fail to represent the full dataset, leading to unreliable evaluations and suboptimal prompts. Existing coreset selection methods, designed for LLM benchmarking, are unsuitable for prompt optimization due to challenges in clustering similar samples, high data collection costs, and the unavailability of performance data for new or private datasets. To overcome these issues, we propose IPOMP, an Iterative evaluation data selection approach for effective Prompt Optimization using real time Model Performance. IPOMP is a two-stage approach that selects representative and diverse samples using semantic clustering and boundary analysis, followed by iterative refinement with real-time model performance data to replace redundant samples. Evaluations on two datasets BIG-bench and LIAR, and two models GPT-3.5 and GPT-4o-mini, show that IPOMP improves effectiveness by at least 1.6% to 3.1%, and stability by at least 50% to 55.5% compared with the best baseline across the studied datasets and models, with minimal computational overhead below 1%. Furthermore, the results demonstrate that our real-time performance-guided refinement approach can be universally applied to enhance existing coreset selection methods.
Anthology ID:
2025.findings-acl.147
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2844–2859
Language:
URL:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.147/
DOI:
10.18653/v1/2025.findings-acl.147
Bibkey:
Cite (ACL):
Ximing Dong, Shaowei Wang, Dayi Lin, and Ahmed Hassan. 2025. Model Performance-Guided Evaluation Data Selection for Effective Prompt Optimization. In Findings of the Association for Computational Linguistics: ACL 2025, pages 2844–2859, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Model Performance-Guided Evaluation Data Selection for Effective Prompt Optimization (Dong et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.147.pdf