PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics

Daniil Larionov; Steffen Eger

PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics

Abstract

Evaluating the quality of machine-generated natural language content is a challenging task in Natural Language Processing (NLP). Recently, large language models (LLMs) like GPT-4 have been employed for this purpose, but they are computationally expensive due to the extensive token usage required by complex evaluation prompts. In this paper, we propose a prompt optimization approach that uses a smaller, fine-tuned language model to compress input data for evaluation prompt, thus reducing token usage and computational cost when using larger LLMs for downstream evaluation. Our method involves a two-stage fine-tuning process: supervised fine-tuning followed by preference optimization to refine the model’s outputs based on human preferences. We focus on Machine Translation (MT) evaluation and utilize the GEMBA-MQM metric as a starting point. Our results show a 2.37× reduction in token usage without any loss in evaluation quality. This work makes state-of-the-art LLM-based metrics like GEMBA-MQM more cost-effective and efficient, enhancing their accessibility for broader use.

Anthology ID:: 2025.naacl-long.592
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11807–11820
Language:
URL:: https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.naacl-long.592/
DOI:
Bibkey:
Cite (ACL):: Daniil Larionov and Steffen Eger. 2025. PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 11807–11820, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics (Larionov & Eger, NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.naacl-long.592.pdf

PDF Cite Search Fix data