FactPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization

David Wan, Mohit Bansal


Abstract
We present FactPEGASUS, an abstractive summarization model that addresses the problem of factuality during pre-training and fine-tuning: (1) We augment the sentence selection strategy of PEGASUS’s (Zhang et al., 2019) pre-training objective to create pseudo-summaries that are both important and factual; (2) We introduce three complementary components for fine-tuning. The corrector removes hallucinations present in the reference summary, the contrastor uses contrastive learning to better differentiate nonfactual summaries from factual ones, and the connector bridges the gap between the pre-training and fine-tuning for better transfer of knowledge. Experiments on three downstream tasks demonstrate that FactPEGASUS substantially improves factuality evaluated by multiple automatic metrics and humans. Our thorough analysis suggests that FactPEGASUS is more factual than using the original pre-training objective in zero-shot and few-shot settings, retains factual behavior more robustly than strong baselines, and does not rely entirely on becoming more extractive to improve factuality.
Anthology ID:
2022.naacl-main.74
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1010–1028
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/2022.naacl-main.74/
DOI:
10.18653/v1/2022.naacl-main.74
Bibkey:
Cite (ACL):
David Wan and Mohit Bansal. 2022. FactPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1010–1028, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
FactPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization (Wan & Bansal, NAACL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/2022.naacl-main.74.pdf
Video:
 https://preview.aclanthology.org/build-pipeline-with-new-library/2022.naacl-main.74.mp4
Code
 meetdavidwan/factpegasus
Data
WikiHow