Mahammad Parwez Alam
2025
SubLIME: Subset Selection via Rank Correlation Prediction for Data-Efficient LLM Evaluation
Gayathri Saranathan
|
Cong Xu
|
Mahammad Parwez Alam
|
Tarun Kumar
|
Martin Foltin
|
Soon Yee Wong
|
Suparna Bhattacharya
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The rapid expansion of Large Language Models (LLMs) and natural language processing datasets has made exhaustive benchmark evaluations computationally prohibitive. Inspired by high-stakes competitions like the International Mathematical Olympiad-where a few well-chosen problems suffice to differentiate top performers—we present SubLIME, which reduces evaluation costs by 80% to 99% while preserving ranking fidelity. It trains a Rank Correlation Prediction (RCP) model that combines limited performance data from only 5-20 anchor LLMs with dataset intrinsic metrics - Difficulty, Quality, and Distributional Dispersion-to predict how closely a candidate subset reflects full-benchmark rankings. Guided by these predictions, SubLIME selects a “winning” subset (1-20% of full set data) for evaluating new LLMs, preserving global rankings significant better than other data-efficient methods across ten diverse benchmarks.
Search
Fix author
Co-authors
- Suparna Bhattacharya 1
- Martin Foltin 1
- Tarun Kumar 1
- Gayathri Saranathan 1
- Soon Yee Wong 1
- show all...
- Cong Xu 1
Venues
- acl1