Coverage-based Fairness in Multi-document Summarization

Haoyuan Li, Yusen Zhang, Rui Zhang, Snigdha Chaturvedi


Abstract
Fairness in multi-document summarization (MDS) measures whether a system can generate a summary fairly representing information from documents with different social attribute values. Fairness in MDS is crucial since a fair summary can offer readers a comprehensive view. Previous works focus on quantifying summary-level fairness using Proportional Representation, a fairness measure based on Statistical Parity. However, Proportional Representation does not consider redundancy in input documents and overlooks corpus-level unfairness. In this work, we propose a new summary-level fairness measure, Equal Coverage, which is based on coverage of documents with different social attribute values and considers the redundancy within documents. To detect the corpus-level unfairness, we propose a new corpus-level measure, Coverage Parity. Our human evaluations show that our measures align more with our definition of fairness. Using our measures, we evaluate the fairness of thirteen different LLMs. We find that Claude3-sonnet is the fairest among all evaluated LLMs. We also find that almost all LLMs overrepresent different social attribute values. The code is available at https://github.com/leehaoyuan/coverage_fairness
Anthology ID:
2025.naacl-long.494
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9801–9819
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.494/
DOI:
Bibkey:
Cite (ACL):
Haoyuan Li, Yusen Zhang, Rui Zhang, and Snigdha Chaturvedi. 2025. Coverage-based Fairness in Multi-document Summarization. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 9801–9819, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Coverage-based Fairness in Multi-document Summarization (Li et al., NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.494.pdf