Lionel Delphin-Poulat


2025

pdf bib
PoSum-Bench: Benchmarking Position Bias in LLM-based Conversational Summarization
Xu Sun | Lionel Delphin-Poulat | Christèle Tarnec | Anastasia Shimorina
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) are increasingly used for zero-shot conversation summarization, but often exhibit positional bias—tending to overemphasize content from the beginning or end of a conversation while neglecting the middle. To address this issue, we introduce PoSum-Bench, a comprehensive benchmark for evaluating positional bias in conversational summarization, featuring diverse English and French conversational datasets spanning formal meetings, casual conversations, and customer service interactions. We propose a novel semantic similarity-based sentence-level metric to quantify the direction and magnitude of positional bias in model-generated summaries, enabling systematic and reference-free evaluation across conversation positions, languages, and conversational contexts.Our benchmark and methodology thus provide the first systematic, cross-lingual framework for reference-free evaluation of positional bias in conversational summarization, laying the groundwork for developing more balanced and unbiased summarization models.