Towards Optimal Evaluation Efficiency for Large Language Models

Guohong Li; Deyi Xiong

Towards Optimal Evaluation Efficiency for Large Language Models

Abstract

Comprehensive evaluation of large language models (LLMs) typically requires large-scale benchmarks, which is costly in terms of both data annotation and computational resource needed for evaluation. To mitigate these challenges, we propose an efficient evaluation framework that selects a question subset based on pre-tested results, thereby reducing the costs. We formulate the subset selection problem as an optimization task, solved using optimal random sampling and simulated annealing algorithms. We compare our approach with prior clustering-based methods and assess their reliability in terms of score accuracy. Additionally, we perform semantic analysis and evaluate whether the selected subsets preserve the semantic information of the original benchmark using Wasserstein distance. Experimental results show that our method outperforms previous approaches in terms of reliability, as measured by L2 norm. Our study provides an optimized perspective for balancing evaluation efficiency and reliability in LLM assessments, while revealing the relationship between optimization methods and semantic retention.

Anthology ID:: 2025.emnlp-main.716
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14187–14194
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.716/
DOI:
Bibkey:
Cite (ACL):: Guohong Li and Deyi Xiong. 2025. Towards Optimal Evaluation Efficiency for Large Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 14187–14194, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Towards Optimal Evaluation Efficiency for Large Language Models (Li & Xiong, EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.716.pdf
Checklist:: 2025.emnlp-main.716.checklist.pdf

PDF Cite Search Checklist Fix data