Optimising Factual Consistency in Summarisation via Preference Learning from Multiple Imperfect Metrics

Yuxuan Ye, Raul Santos-Rodriguez, Edwin Simpson


Abstract
Reinforcement learning with evaluation metrics as rewards is widely used to enhance specific capabilities of language models. However, for tasks such as factually consistent summarisation, existing metrics remain underdeveloped, limiting their effectiveness as signals for shaping model behaviour.While individual factuality metrics are unreliable, their combination can more effectively capture diverse factual errors. We leverage this insight to introduce an automated training pipeline that improves factual consistency in summaries by aggregating scores from different weak metrics. Our approach avoids the need for complex reward shaping by mapping scores to preferences and filtering out cases with high disagreement between metrics. For each source document, we generate lexically similar summary pairs by varying decoding strategies, enabling the model to learn from factual differences caused by subtle lexical differences. This approach constructs a high-quality preference dataset using only source documents.Experiments demonstrate consistent factuality gains across models, ranging from early encoder-decoder architectures to modern large language models, with smaller models reaching comparable factuality to larger ones.
Anthology ID:
2025.findings-emnlp.940
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17342–17355
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.940/
DOI:
10.18653/v1/2025.findings-emnlp.940
Bibkey:
Cite (ACL):
Yuxuan Ye, Raul Santos-Rodriguez, and Edwin Simpson. 2025. Optimising Factual Consistency in Summarisation via Preference Learning from Multiple Imperfect Metrics. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 17342–17355, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Optimising Factual Consistency in Summarisation via Preference Learning from Multiple Imperfect Metrics (Ye et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.940.pdf
Checklist:
 2025.findings-emnlp.940.checklist.pdf