Improving the Robustness of Summarization Models by Detecting and Removing Input Noise

Kundan Krishna; Yao Zhao; Jie Ren; Balaji Lakshminarayanan; Jiaming Luo; Mohammad Saleh; Peter J. Liu

doi:10.18653/v1/2023.findings-emnlp.93

Improving the Robustness of Summarization Models by Detecting and Removing Input Noise

Kundan Krishna, Yao Zhao, Jie Ren, Balaji Lakshminarayanan, Jiaming Luo, Mohammad Saleh, Peter Liu

Abstract

The evaluation of abstractive summarization models typically uses test data that is identically distributed as training data. In real-world practice, documents to be summarized may contain input noise caused by text extraction artifacts or data pipeline bugs. The robustness of model performance under distribution shift caused by such noise is relatively under studied. We present a large empirical study quantifying the sometimes severe loss in performance – up to 12 ROUGE-1 points – from different types of input noise for a range of datasets and model sizes. We then propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any extra training, auxiliary models, or even prior knowledge of the type of noise. Our proposed approach effectively mitigates the loss in performance, recovering a large fraction of the performance drop, sometimes as large as 11 ROUGE-1 points.

Anthology ID:: 2023.findings-emnlp.93
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2023
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1324–1336
Language:
URL:: https://aclanthology.org/2023.findings-emnlp.93
DOI:: 10.18653/v1/2023.findings-emnlp.93
Bibkey:
Cite (ACL):: Kundan Krishna, Yao Zhao, Jie Ren, Balaji Lakshminarayanan, Jiaming Luo, Mohammad Saleh, and Peter Liu. 2023. Improving the Robustness of Summarization Models by Detecting and Removing Input Noise. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 1324–1336, Singapore. Association for Computational Linguistics.
Cite (Informal):: Improving the Robustness of Summarization Models by Detecting and Removing Input Noise (Krishna et al., Findings 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-2024-clasp/2023.findings-emnlp.93.pdf
Video:: https://preview.aclanthology.org/ingest-2024-clasp/2023.findings-emnlp.93.mp4

PDF Search Video