PoSum-Bench: Benchmarking Position Bias in LLM-based Conversational Summarization
Xu Sun, Lionel Delphin-Poulat, Christèle Tarnec, Anastasia Shimorina
Abstract
Large language models (LLMs) are increasingly used for zero-shot conversation summarization, but often exhibit positional bias—tending to overemphasize content from the beginning or end of a conversation while neglecting the middle. To address this issue, we introduce PoSum-Bench, a comprehensive benchmark for evaluating positional bias in conversational summarization, featuring diverse English and French conversational datasets spanning formal meetings, casual conversations, and customer service interactions. We propose a novel semantic similarity-based sentence-level metric to quantify the direction and magnitude of positional bias in model-generated summaries, enabling systematic and reference-free evaluation across conversation positions, languages, and conversational contexts.Our benchmark and methodology thus provide the first systematic, cross-lingual framework for reference-free evaluation of positional bias in conversational summarization, laying the groundwork for developing more balanced and unbiased summarization models.- Anthology ID:
- 2025.emnlp-main.404
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 7996–8020
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.404/
- DOI:
- Cite (ACL):
- Xu Sun, Lionel Delphin-Poulat, Christèle Tarnec, and Anastasia Shimorina. 2025. PoSum-Bench: Benchmarking Position Bias in LLM-based Conversational Summarization. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 7996–8020, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- PoSum-Bench: Benchmarking Position Bias in LLM-based Conversational Summarization (Sun et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.404.pdf