Attention Head Masking for Inference Time Content Selection in Abstractive Summarization

Shuyang Cao; Lu Wang

doi:10.18653/v1/2021.naacl-main.397

Attention Head Masking for Inference Time Content Selection in Abstractive Summarization

Abstract

How can we effectively inform content selection in Transformer-based abstractive summarization models? In this work, we present a simple-yet-effective attention head masking technique, which is applied on encoder-decoder attentions to pinpoint salient content at inference time. Using attention head masking, we are able to reveal the relation between encoder-decoder attentions and content selection behaviors of summarization models. We then demonstrate its effectiveness on three document summarization datasets based on both in-domain and cross-domain settings. Importantly, our models outperform prior state-of-the-art models on CNN/Daily Mail and New York Times datasets. Moreover, our inference-time masking technique is also data-efficient, requiring only 20% of the training samples to outperform BART fine-tuned on the full CNN/DailyMail dataset.

Anthology ID:: 2021.naacl-main.397
Volume:: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:: June
Year:: 2021
Address:: Online
Editors:: Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5008–5016
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2021.naacl-main.397/
DOI:: 10.18653/v1/2021.naacl-main.397
Bibkey:
Cite (ACL):: Shuyang Cao and Lu Wang. 2021. Attention Head Masking for Inference Time Content Selection in Abstractive Summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5008–5016, Online. Association for Computational Linguistics.
Cite (Informal):: Attention Head Masking for Inference Time Content Selection in Abstractive Summarization (Cao & Wang, NAACL 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2021.naacl-main.397.pdf
Video:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2021.naacl-main.397.mp4
Data: New York Times Annotated Corpus

PDF Cite Search Video Fix data