Mind the Style Gap: Meta-Evaluation of Style and Attribute Transfer Metrics

Amalie Brogaard Pauli, Isabelle Augenstein, Ira Assent


Abstract
Large language models (LLMs) make it easy to rewrite a text in any style – e.g. to make it more polite, persuasive, or more positive – but evaluation thereof is not straightforward. A challenge lies in measuring content preservation: that content not attributable to style change is retained. This paper presents a large meta-evaluation of metrics for evaluating style and attribute transfer, focusing on content preservation. We find that meta-evaluation studies on existing datasets lead to misleading conclusions about the suitability of metrics for content preservation. Widely used metrics show a high correlation with human judgments despite being deemed unsuitable for the task – because they do not abstract from style changes when evaluating content preservation. We show that the overly high correlations with human judgment stem from the nature of the test data. To address this issue, we introduce a new, challenging test set specifically designed for evaluating content preservation metrics for style transfer. We construct the data by creating high variation in the content preservation. Using this dataset, we demonstrate that suitable metrics for content preservation for style transfer indeed are style-aware.To support efficient evaluation, we propose a new style-aware method that utilises small language models, obtaining a higher alignment with human judgements than prompting a model of a similar size as an autorater.
Anthology ID:
2025.findings-emnlp.1175
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
21550–21564
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1175/
DOI:
10.18653/v1/2025.findings-emnlp.1175
Bibkey:
Cite (ACL):
Amalie Brogaard Pauli, Isabelle Augenstein, and Ira Assent. 2025. Mind the Style Gap: Meta-Evaluation of Style and Attribute Transfer Metrics. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 21550–21564, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Mind the Style Gap: Meta-Evaluation of Style and Attribute Transfer Metrics (Pauli et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1175.pdf
Checklist:
 2025.findings-emnlp.1175.checklist.pdf